分类:
2006-04-21 01:01:39
The processing unit is one of the fastest components of the system. It is comparatively rare for a single program to keep the CPU 100 percent busy (that is, 0 percent idle and 0 percent wait) for more than a few seconds at a time. Even in heavily loaded multiuser systems, there are occasional 10 milliseconds (ms) periods that end with all threads in a wait state. If a monitor shows the CPU 100 percent busy for an extended period, there is a good chance that some program is in an infinite loop. Even if the program is "merely" expensive, rather than broken, it needs to be identified and dealt with.
The first tool to use is the vmstat command, which quickly provides compact information about various system resources and their related performance problems. The vmstat command reports statistics about kernel threads in the run and wait queue, memory, paging, disks, interrupts, system calls, context switches, and CPU activity. The reported CPU activity is a percentage breakdown of user mode, system mode, idle time, and waits for disk I/O.
Note: If the vmstat command is used without any options or only with the interval and optionally, the count parameter, such as vmstat 2 10; then the first line of numbers is an average since system reboot.
As a CPU monitor, the vmstat command is superior to the iostat command in that its one-line-per-report output is easier to scan as it scrolls and there is less overhead involved if there are a lot of disks attached to the system. The following example can help you identify situations in which a program has run away or is too CPU-intensive to run in a multiuser environment.
# vmstat 2 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 0 22478 1677 0 0 0 0 0 0 188 1380 157 57 32 0 10 1 0 22506 1609 0 0 0 0 0 0 214 1476 186 48 37 0 16 0 0 22498 1582 0 0 0 0 0 0 248 1470 226 55 36 0 9 2 0 22534 1465 0 0 0 0 0 0 238 903 239 77 23 0 0 2 0 22534 1445 0 0 0 0 0 0 209 1142 205 72 28 0 0 2 0 22534 1426 0 0 0 0 0 0 189 1220 212 74 26 0 0 3 0 22534 1410 0 0 0 0 0 0 255 1704 268 70 30 0 0 2 1 22557 1365 0 0 0 0 0 0 383 977 216 72 28 0 0 2 0 22541 1356 0 0 0 0 0 0 237 1418 209 63 33 0 4 1 0 22524 1350 0 0 0 0 0 0 241 1348 179 52 32 0 16 1 0 22546 1293 0 0 0 0 0 0 217 1473 180 51 35 0 14
This output shows the effect of introducing a program in a tight loop to a busy multiuser system. The first three reports (the summary has been removed) show the system balanced at 50-55 percent user, 30-35 percent system, and 10-15 percent I/O wait. When the looping program begins, all available CPU cycles are consumed. Because the looping program does no I/O, it can absorb all of the cycles previously unused because of I/O wait. Worse, it represents a process that is always ready to take over the CPU when a useful process relinquishes it. Because the looping program has a priority equal to that of all other foreground processes, it will not necessarily have to give up the CPU when another process becomes dispatchable. The program runs for about 10 seconds (five reports), and then the activity reported by the vmstat command returns to a more normal pattern.
The CPU statistics can be somewhat distorted on systems with very high device-interrupt load. This situation is due to the fact that the tool samples on timer interrupts. The timer is the lowest priority device and therefore it can easily be preempted by other interrupts. To eliminate this distortion, operating system versions later than AIX 4.3.3 use a different method to sample the timer.
Note: For SMP systems the us, sy, id and wa columns are only averages over the processors (the sar command can report per-processor statistics). But keep in mind that the I/O wait statistic per processor is not really a processor-specific statistic; it is a global statistic. An I/O wait is distinguished from idle time only by the state of a pending I/O. If there is any pending disk I/O, and the processor is not busy, then it is an I/O wait time. AIX 4.3.3 and later contains an enhancement to the method used to compute the percentage of CPU time spent waiting on disk I/O (wio time). See for more details.
Optimum use would have the CPU working 100 percent of the time. This holds true in the case of a single-user system with no need to share the CPU. Generally, if us + sy time is below 90 percent, a single-user system is not considered CPU constrained. However, if us + sy time on a multiuser system exceeds 80 percent, the processes may spend time waiting in the run queue. Response time and throughput might suffer.
To check if the CPU is the bottleneck, consider the four cpu columns and the two kthr (kernel threads) columns in the vmstat report. It may also be worthwhile looking at the faults column:
Percentage breakdown of CPU time usage during the interval. The cpu columns are as follows:
The us column shows the percent of CPU time spent in user mode. A UNIX process can execute in either user mode or system (kernel) mode. When in user mode, a process executes within its application code and does not require kernel resources to perform computations, manage memory, or set variables.
The sy column details the percentage of time the CPU was executing a process in system mode. This includes CPU resource consumed by kernel processes (kprocs) and others that need access to kernel resources. If a process needs kernel resources, it must execute a system call and is thereby switched to system mode to make that resource available. For example, reading or writing of a file requires kernel resources to open the file, seek a specific location, and read or write data, unless memory mapped files are used.
The id column shows the percentage of time which the CPU is idle, or waiting, without pending local disk I/O. If there are no threads available for execution (the run queue is empty), the system dispatches a process called wait. On an SMP system, one wait thread per processor can be dispatched. The report generated by the ps command (with the -k or -g 0 option) identifies this as kproc or wait. On a uniprocessor system, the process ID (PID) usually is 516. SMP systems will have an idle kproc for each processor. If the ps report shows a high aggregate time for this thread, it means there were significant periods of time when no other thread was ready to run or waiting to be executed on the CPU. The system was therefore mostly idle and waiting for new tasks.
If there are no I/Os pending to a local disk, all time charged to wait is classified as idle time. In operating system version 4.3.2 and earlier, an access to remote disks (NFS-mounted disks) is treated as idle time (with a small amount of sy time to execute the NFS requests) because there is no pending I/O request to a local disk. With AIX 4.3.3 and later NFS goes through the buffer cache, and waits in those routines are accounted for in the wa statistics.
The wa column details the percentage of time the CPU was idle with pending local disk I/O (in AIX 4.3.3 and later this is also true for NFS-mounted disks). If there is at least one outstanding I/O to a disk when wait is running, the time is classified as waiting for I/O. Unless asynchronous I/O is being used by the process, an I/O request to disk causes the calling process to block (or sleep) until the request has been completed. Once an I/O request for a process completes, it is placed on the run queue. If the I/Os were completing faster, more CPU time could be used.
A wa value over 25 percent could indicate that the disk subsystem might not be balanced properly, or it might be the result of a disk-intensive workload.
For information on the change made to wa, see .
Kernel threads placed on various queues per second over the sampling interval (state changes). The kthr columns are as follows:
Average number of kernel threads waiting on the run queue per second; this means the average number of kernel threads that are in the run queue per second. This field indicates the number of threads that can be run. This value should be less than five for non-SMP systems. For SMP systems, this value should be less than:
5 x (Ntotal - Nbind)
Where Ntotal stands for total number of processors and Nbind for the number of processors which have been bound to processes, for example, with the bindprocessor command.
If this number increases rapidly, examine the applications. But systems may be also running fine with 10 to 15 threads on their run queue, depending on the thread tasks and the amount of time they run.
Average number of kernel threads in the wait queue per second. These threads are waiting for resources or I/O. Threads are also located in the wait queue when waiting for one of their thread pages to be paged in. This value is usually near zero. But if the run-queue value increases, the wait-queue normally also increases. If threads are awakened simultaneously during a one second interval, the run-queue could be high but still show low CPU utilization if the threads go right back to sleep.
If processes are suspended due to memory load control, the blocked column (b) in the vmstat report indicates the increase in the number of threads rather than the run queue.
For vmstat -I The number of threads waiting on actual physical I/O per second.
Information about process control, such as trap and interrupt rate. The faults columns are as follows:
Number of device interrupts per second observed in the interval. Additional information can be found in .
The number of system calls per second observed in the interval. Resources are available to user processes through well-defined system calls. These calls instruct the kernel to perform operations for the calling process and exchange data between the kernel and the process. Because workloads and applications vary widely, and different calls perform different functions, it is impossible to define how many system calls per-second are too many. But typically, when the sy column raises over 10000 calls per second on a uniprocessor, further investigations is called for (on an SMP system the number is 10000 calls per second per processor). One reason could be "polling" subroutines like the select() subroutine. For this column, it is advisable to have a baseline measurement that gives a count for a normal sy value.
Number of context switches per second observed in the interval. The physical CPU resource is subdivided into logical time slices of 10 milliseconds each. Assuming a thread is scheduled for execution, it will run until its time slice expires, until it is preempted, or until it voluntarily gives up control of the CPU. When another thread is given control of the CPU, the context or working environment of the previous thread must be saved and the context of the current thread must be loaded. The operating system has a very efficient context switching procedure, so each switch is inexpensive in terms of resources. Any significant increase in context switches, such as when cs is a lot higher than the disk I/O and network packet rate, should be cause for further investigation.
The iostat command is the fastest way to get a first impression, whether or not the system has a disk I/O-bound performance problem (see ). The tool also reports CPU statistics.
The following example shows a part of an iostat command output. The first stanza shows the summary statistic since system startup.
# iostat -t 2 6 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 0.8 8.4 2.6 88.5 0.5 0.0 80.2 4.5 3.0 92.1 0.5 0.0 40.5 7.0 4.0 89.0 0.0 0.0 40.5 9.0 2.5 88.5 0.0 0.0 40.5 7.5 1.0 91.5 0.0 0.0 40.5 10.0 3.5 80.5 6.0
The CPU statistics columns (% user, % sys, % idle, and % iowait) provide a breakdown of CPU usage. This information is also reported in the vmstat command output in the columns labeled us, sy, id, and wa. For a detailed explanation for the values, see . Also note the change made to %iowait described in .
The sar command gathers statistical data about the system. Though it can be used to gather some useful data regarding system performance, the sar command can increase the system load that can exacerbate a pre-existing performance problem if the sampling frequency is high. But compared to the accounting package, the sar command is less intrusive. The system maintains a series of system activity counters which record various activities and provide the data that the sar command reports. The sar command does not cause these counters to be updated or used; this is done automatically regardless of whether or not the sar command runs. It merely extracts the data in the counters and saves it, based on the sampling rate and number of samples specified to the sar command.
With its numerous options, the sar command provides queuing, paging, TTY, and many other statistics. One important feature of the sar command is that it reports either systemwide (global among all processors) CPU statistics (which are calculated as averages for values expressed as percentages, and as sums otherwise), or it reports statistics for each individual processor. Therefore, this command is particularly useful on SMP systems.
There are three situations to use the sar command:
To collect and display system statistic reports immediately, use the following command:
# sar -u 2 5 AIX texmex 3 4 000691854C00 01/27/00 17:58:15 %usr %sys %wio %idle 17:58:17 43 9 1 46 17:58:19 35 17 3 45 17:58:21 36 22 20 23 17:58:23 21 17 0 63 17:58:25 85 12 3 0 Average 44 15 5 35
This example is from a single user workstation and shows the CPU utilization.
The -o and -f options (write and read to/from user given data files) allow you to visualize the behavior of your machine in two independent steps. This consumes less resources during the problem-reproduction period. You can use a separate machine to analyze the data by transferring the file because the collected binary file keeps all data the sar command needs.
# sar -o /tmp/sar.out 2 5 > /dev/null
The above command runs the sar command in the background, collects system activity data at 2-second intervals for 5 intervals, and stores the (unformatted) sar data in the /tmp/sar.out file. The redirection of standard output is used to avoid a screen output.
The following command extracts CPU information from the file and outputs a formatted report to standard output:
# sar -f/tmp/sar.out AIX texmex 3 4 000691854C00 01/27/00 18:10:18 %usr %sys %wio %idle 18:10:20 9 2 0 88 18:10:22 13 10 0 76 18:10:24 37 4 0 59 18:10:26 8 2 0 90 18:10:28 20 3 0 77 Average 18 4 0 78
The captured binary data file keeps all information needed for the reports. Every possible sar report could therefore be investigated. This also allows to display the processor-specific information of an SMP system on a single processor system.
The sar command calls a process named sadc to access system data. Two shell scripts (/usr/lib/sa/sa1 and /usr/lib/sa/sa2) are structured to be run by the cron daemon and provide daily statistics and reports. Sample stanzas are included (but commented out) in the /var/spool/cron/crontabs/adm crontab file to specify when the cron daemon should run the shell scripts.
The following lines show a modified crontab for the adm user. Only the comment characters for the data collections were removed:
#================================================================= # SYSTEM ACTIVITY REPORTS # 8am-5pm activity reports every 20 mins during weekdays. # activity reports every an hour on Saturday and Sunday. # 6pm-7am activity reports every an hour during weekdays. # Daily summary prepared at 18:05. #================================================================= 0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 & 0 * * * 0,6 /usr/lib/sa/sa1 & 0 18-7 * * 1-5 /usr/lib/sa/sa1 & 5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 3600 -ubcwyaqvm & #=================================================================
Collection of data in this manner is useful to characterize system usage over a period of time and to determine peak usage hours.
The most useful CPU-related options for the sar command are:
The -P option reports per-processor statistics for the specified processors. By specifying the ALL keyword, statistics for each individual processor and an average for all processors is reported. Of the flags which specify the statistics to be reported, only the -a, -c, -m, -u, and -w flags are meaningful with the -P flag.
The following example shows the per-processor statistic while a CPU-bound program was running on processor number 0:
# sar -P ALL 2 3 AIX rugby 3 4 00058033A100 01/27/00 17:30:50 cpu %usr %sys %wio %idle 17:30:52 0 8 92 0 0 1 0 4 0 96 2 0 1 0 99 3 0 0 0 100 - 2 24 0 74 17:30:54 0 12 88 0 0 1 0 3 0 97 2 0 1 0 99 3 0 0 0 100 - 3 23 0 74 17:30:56 0 11 89 0 0 1 0 3 0 97 2 0 0 0 100 3 0 0 0 100 - 3 23 0 74 Average 0 10 90 0 0 1 0 4 0 96 2 0 1 0 99 3 0 0 0 100 - 3 24 0 74
The last line of every stanza, which starts with a dash (-) in the cpu column, is the average for all processors. An average (-) line displays only if the -P ALL option is used. It is removed if processors are specified. The last stanza, labeled with the word Average instead of a time stamp, keeps the averages for the processor-specific rows over all stanzas.
The following example shows the vmstat output during this time:
# vmstat 2 5 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ------------ r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 5636 16054 0 0 0 0 0 0 116 266 5 0 1 99 0 1 1 5733 15931 0 0 0 0 0 0 476 50781 35 2 27 70 0 1 1 5733 15930 0 0 0 0 0 0 476 49437 27 2 24 74 0 1 1 5733 15930 0 0 0 0 0 0 473 48923 31 3 23 74 0 1 1 5733 15930 0 0 0 0 0 0 466 49383 27 3 23 74 0
The first numbered line is the summary since startup of the system. The second line reflects the start of the sar command, and with the third row, the reports are comparable. The vmstat command can only display the average CPU utilization over all processors. This is comparable with the dashed (-) rows from the CPU utilization output from the sar command.
This displays the CPU utilization. It is the default if no other flag is specified. It shows the same information as the CPU statistics of the vmstat or iostat commands.
During the following example, a copy command was started:
# sar -u -P ALL 1 5 AIX rugby 3 4 00058033A100 10/07/99 13:33:42 cpu %usr %sys %wio %idle 13:33:43 0 0 0 0 100 1 0 0 0 100 2 0 0 0 100 3 0 0 0 100 - 0 0 0 100 13:33:44 0 2 66 0 32 1 0 1 0 99 2 0 0 0 100 3 0 1 0 99 - 0 17 0 82 13:33:45 0 1 52 44 3 1 0 1 0 99 2 0 4 0 96 3 0 0 0 100 - 0 14 11 74 13:33:46 0 0 8 91 1 1 0 0 0 100 2 0 0 0 100 3 0 1 0 99 - 0 2 23 75 13:33:47 0 0 7 93 0 1 0 0 0 100 2 0 1 0 99 3 0 0 0 100 - 0 2 23 75 Average 0 1 27 46 27 1 0 0 0 100 2 0 1 0 99 3 0 0 0 100 - 0 7 11 81
The cp command is working on processor number 0, and the three other processors are idle. This reflects the change with operating system version 4.3.3 (see ).
The -c option shows the system call rate.
# sar -c 1 3 19:28:25 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s 19:28:26 134 36 1 0.00 0.00 2691306 1517 19:28:27 46 34 1 0.00 0.00 2716922 1531 19:28:28 46 34 1 0.00 0.00 2716922 1531 Average 75 35 1 0.00 0.00 2708329 1527
While the vmstat command shows system call rates as well, the sar command can also show if these system calls are read(), write(), fork(), exec(), and others. Pay particular attention to the fork/s column. If this is high, then further investigation might be needed using the accounting utilities, the trace command, or the tprof command.
The -q option shows the run-queue size and the swap-queue size.
# sar -q 5 3 19:31:42 runq-sz %runocc swpq-sz %swpocc 19:31:47 1.0 100 1.0 100 19:31:52 2.0 100 1.0 100 19:31:57 1.0 100 1.0 100 Average 1.3 95 1.0 95
The -q option can indicate whether you have too many jobs running (runq-sz) or have a potential paging bottleneck. In a highly transactional system, for example Enterprise Resource Planning (ERP), the run queue can be in the hundreds, because each transaction uses small amounts of CPU time. If paging is the problem, run the vmstat command. High I/O wait indicates that there is significant competing disk activity or excessive paging due to insufficient memory.
Using the xmperf program displays CPU use as a moving skyline chart. The xmperf program is described in detail in the Performance Toolbox Version 2 and 3 for AIX: Guide and Reference.