Linux性能监控笔记（一）CPU-bolix-ChinaUnix博客

netbolixbolix.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

bolix

博客访问： 427167
博文数量： 147
博客积分： 5400
博客等级：大校
技术积分： 1380
用户组：普通用户
注册时间： 2007-02-12 20:29

文章分类

全部博文（147）

ITIL（0）
DNS（2）
TCP/IP（4）
network（2）
security（2）
mysql（8）
solaris（1）
杂（8）
windows（14）

win_script（1）

exchange（1）

ad（7）
linux&freebsd（79）
未分配的博文（27）

文章存档

2013年（1）

2012年（44）

2011年（5）

2010年（4）

2009年（22）

2008年（71）

我的朋友

最近访客

推荐博文

Linux性能监控笔记（一）CPU

分类：

2012-03-19 11:00:54

原文地址：Linux性能监控笔记（一）CPU 作者：fortara

1.cpu

load含义

一个双核系统执行2个线程，还有4个在运行队列中，则load就是6

cpu利用率

User Time - 在user space中执行进程在cpu开销中的百分比

System Time - 在kernel space中线程和中断所占百分比

Wait IO - 进程阻塞等待完成一次IO所占百分比

Idle - 空闲进程所占百分比

cpu性能目标

Run Queues - 每个cpu队列不超过1-3个进程，如一个双核cpu，load不应超过6

CPU Utilization - 均衡比例应为

65% - 70% User Time

30% - 35% System Time

0% - 5% Idle Time

Context Switches - 上下文切换，所谓上下文就是程序从kernel space切换到user space或切换回来。

vmstat 是个低开销的系统性能观察工具

表 2.1. The vmstat CPU statistics
Field	Description
r	The amount of threads in the run queue. These are threads that are runnable, but the CPU is not available to execute them. 当前运行队列中线程的数目.代表线程处于可运行状态,但CPU 还未能执行.
b	This is the number of processes blocked and waiting on IO requests to finish. 当前进程阻塞并等待IO 请求完成的数目
in	This is the number of interrupts being processed. 当前中断被处理的数目
cs	This is the number of context switches currently happening on the system. 当前kernel system中,发生上下文切换的数目
us	This is the percentage of user CPU utilization. CPU 利用率的百分比
sys	This is the percentage of kernel and interrupts utilization. 内核和中断利用率的百分比
wa	This is the percentage of idle processor time due to the fact that ALL runnable threads are blocked waiting on IO. 所有可运行状态线程被阻塞在等待IO 请求的百分比
id	This is the percentage of time that the CPU is completely idle. CPU 空闲时间的百分比

案例学习

在这个例子中,这个系统被充分利用

# vmstat 1 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 3 0 206564 15092 80336 176080 0 0 0 0 718 26 81 19 0 0 2 0 206564 14772 80336 176120 0 0 0 0 758 23 96 4 0 0 1 0 206564 14208 80336 176136 0 0 0 0 820 20 96 4 0 0 1 0 206956 13884 79180 175964 0 412 0 2680 1008 80 93 7 0 0 2 0 207348 14448 78800 175576 0 412 0 412 763 70 84 16 0 0 2 0 207348 15756 78800 175424 0 0 0 0 874 25 89 11 0 0 1 0 207348 16368 78800 175596 0 0 0 0 940 24 86 14 0 0 1 0 207348 16600 78800 175604 0 0 0 0 929 27 95 3 0 2 3 0 207348 16976 78548 175876 0 0 0 2508 969 35 93 7 0 0 4 0 207348 16216 78548 175704 0 0 0 0 874 36 93 6 0 1 4 0 207348 16424 78548 175776 0 0 0 0 850 26 77 23 0 0 2 0 207348 17496 78556 175840 0 0 0 0 736 23 83 17 0 0 0 0 207348 17680 78556 175868 0 0 0 0 861 21 91 8 0 1

根据观察值,我们可以得到以下结论：

有大量的中断(in) 和较少的上下文切换(cs).这意味着一个单一的进程在产生对硬件设备的请求.
进一步显示某单个应用,user time(us)经常在85%或者更多.考虑到较少的上下文切换,这个应用应该还在处理器中被处理.
运行队列还在可接受的性能范围内,其中有2个地方,是超出了允许限制。
在这个例子中,内核调度中的上下文切换处于饱和

# vmstat 1 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 2 1 207740 98476 81344 180972 0 0 2496 0 900 2883 4 12 57 27 0 1 207740 96448 83304 180984 0 0 1968 328 810 2559 8 9 83 0 0 1 207740 94404 85348 180984 0 0 2044 0 829 2879 9 6 78 7 0 1 207740 92576 87176 180984 0 0 1828 0 689 2088 3 9 78 10 2 0 207740 91300 88452 180984 0 0 1276 0 565 2182 7 6 83 4 3 1 207740 90124 89628 180984 0 0 1176 0 551 2219 2 7 91 0 4 2 207740 89240 90512 180984 0 0 880 520 443 907 22 10 67 0 5 3 207740 88056 91680 180984 0 0 1168 0 628 1248 12 11 77 0 4 2 207740 86852 92880 180984 0 0 1200 0 654 1505 6 7 87 0 6 1 207740 85736 93996 180984 0 0 1116 0 526 1512 5 10 85 0 0 1 207740 84844 94888 180984 0 0 892 0 438 1556 6 4 90 0

根据观察值,我们可以得到以下结论：
1. 上下文切换数目高于中断数目,说明kernel中相当数量的时间都开销在上下文切换线程.
2. 大量的上下文切换将导致CPU 利用率分类不均衡.很明显实际上等待io 请求的百分比(wa)非常高,以及user time百分比非常低(us).
3. 因为CPU 都阻塞在IO请求上,所以运行队列里也有相当数目的可运行状态线程在等待执行

如果你的系统运行在多处理器芯片上,你可以使用 mpstat 命令来监控每个独立的芯片.Linux 内核视双核处理器为2 CPU’s,因此一个双核处理器的双内核就报告有4 CPU’s 可用.

mpstat 命令给出的CPU 利用率统计值大致和 vmstat 一致,但是 mpstat 可以给出基于单个处理器的统计值.

# mpstat –P ALL 1 Linux 2.4.21-20.ELsmp (localhost.localdomain) 05/23/2006 05:17:31 PM CPU %user %nice %system %idle intr/s 05:17:32 PM all 0.00 0.00 3.19 96.53 13.27 05:17:32 PM 0 0.00 0.00 0.00 100.00 0.00 05:17:32 PM 1 1.12 0.00 12.73 86.15 13.27 05:17:32 PM 2 0.00 0.00 0.00 100.00 0.00 05:17:32 PM 3 0.00 0.00 0.00 100.00 0.00
top -d 1 mpstat –P ALL 1 while :; do ps -eo pid,ni,pri,pcpu,psr,comm | grep ‘mysqld’; sleep 1;
监控 CPU 性能由以下几个部分组成：

检查system的运行队列,以及确定不要超出每个处理器3个可运行状态线程的限制.
确定CPU 利用率中user/system比例维持在70/30
当CPU 开销更多的时间在system mode,那就说明已经超负荷并且应该尝试重新调度优先级
当I/O 处理得到增长,CPU 范畴的应用处理将受到影响

阅读(450) | 评论(0) | 转发(0) |

上一篇：HTTP头部详解

下一篇：linux进程的休眠（等待队列）

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6