2010年(37)
分类: LINUX
2010-07-04 12:04:47
There are many tools to monitor disk io in Linux, such as 'iostat', 'sar', and 'sadc'. It is easy to use these tools to find out which device (or hard disk) is having the busiest I/O activities, but it is a bit difficult to find out which program (the exact pid) that is actually using a particular device. This task is a easier on AIX with 'filemon' (see AIX section). The following example is one way to determine which program may be generating the most disk I/O on Linux:
# sar -d
Linux 2.4.21-27.ELsmp (pw101) 12/04/2005
12:00:00 AM DEV tps rd_sec/s wr_sec/s
....
Average: dev8-128 7.16 5.37 75.07
Average: dev8-129 7.16 5.37 75.07
Average: dev8-130 0.00 0.00 0.00
.... The above command finds the busiest device. To determine what that device is, do: # more /proc/devices Character devices: 1 mem 2 pty 3 ttyp 4 ttyS 5 cua 7 vcs ... Block devices: 1 ramdisk 2 fd 3 ide0 7 loop 8 sd ... 71 sd 129 sd The above example indicates the 'Block device' 8 is 'sd' or scsi disk, and #129 is one of scsi disks in the system. So 'sar -d' tells you that one of the scsi disks, or device 'dev8-129' is busy. Now the following command is a bit busy, but you can always strip away piped commands to see what output is like for each command. But the shown example give you which program (along with its pid) has the most open files on a given busy device. In our example, the busy device we are interested is 'dev8-129', but is represented as '8,129' in the 'lsof' output. # lsof | grep "8,129" | awk '{print $1" "$2}' | uniq -c | sort -n -r [count p pid] 307 java 31916 307 java 31915 307 java 31914 307 java 31913 307 java 31912 307 java 31911 307 java 31910 307 java 31909 307 java 31908 307 java 31907 307 java 31906 307 java 31905 307 java 31904 307 java 31903 307 java 27645 55 db2sysc 32011 51 db2sysc 32012 46 httpd 32009 46 httpd 32008 46 httpd 32006 46 httpd 31901 46 httpd 31900 46 httpd 31899 46 httpd 31898 46 httpd 31897 46 httpd 31874 45 db2sysc 32019 40 db2fmp 31996 31 sshd 26713 29 db2sysc 32025 29 db2sysc 32024 ... The above output gives a clue as to which process(es) to look into for disk I/O problems (i.e., 'java'). The 'java' processes are having the largest number of open files, and chances are doing the most disk I/O. Of course, some files maybe memory-cached, so this method may not always work. I'd like to hear from you if you have a better solution. Please contact me (see footer for contact info). |
'Perfmon' is a Windows tool that can be configured to monitor (and log to file) system resources. Physical disk counters are one area that can be added to the overall resources to be monitored. You can start 'perfmon' simply by typing 'perfmon' in the "Start->Run..." window:
|
The above example is using 'perfmon' to show resource usage using a perf log file. The 'Counter' column shows the disk I/O related counters for disk drives C: and E:. You can use these counters to see clearly which disk is busy. To see which program is causing the disk I/O, there is a freeware tool called 'Filemon' can help. Here is a simple description of it - "FileMon monitors and displays file system activity on a system in real-time. Its advanced capabilities make it a powerful tool for exploring the way Windows works, seeing how applications use the files and DLLs, or tracking down problems in system or application file configurations. Filemon's timestamping feature will show you precisely when every open, read, write or delete, happens, and its status column tells you the outcome." More .
Once you find which device is busy using the 'iostat' command, you can use 'lspv -L' to find out which volume group is associated with that disk. This at least gives you a clue as to possibly which groups of programs that may cause the disk I/O (asusme the volume groups are set for a particular puspose, e.g., databases):
# iostat aux 1 System configuration: lcpu=8 drives=22 18:34:05 device %busy avque r+w/s Kbs/s avwait avserv 18:34:06 hdisk6 0 0.0 0 0 0.0 0.0 hdisk7 0 0.0 0 0 0.0 0.0 hdisk4 0 0.0 0 0 0.0 0.0 hdisk5 0 0.0 0 0 0.0 0.0 hdisk8 0 0.0 0 0 0.0 0.0 hdisk11 0 0.0 0 0 0.0 0.0 hdisk9 0 0.0 0 0 0.0 0.0 hdisk10 0 0.0 0 0 0.0 0.0 hdisk12 0 0.0 0 0 0.0 0.0 hdisk13 0 0.0 0 0 0.0 0.0 hdisk14 0 0.0 0 0 0.0 0.0 hdisk0 0 0.0 0 0 0.0 0.0 hdisk15 0 0.0 0 0 0.0 0.0 hdisk2 0 0.0 0 0 0.0 0.0 hdisk3 0 0.0 0 0 0.0 0.0 hdisk1 0 0.0 0 0 0.0 0.0 dac0 0 0.0 0 0 0.0 0.0 dac0-utm 0 0.0 0 0 0.0 0.0 dac1 0 0.0 0 0 0.0 0.0 dac1-utm 0 0.0 0 0 0.0 0.0 hdisk16 0 0.0 0 0 0.0 0.0 hdisk17 0 0.0 0 0 0.0 0.0 # lspv -L hdisk0 002fbf7d85488b31 rootvg active hdisk1 002fbf7db8392e38 pagingvg active hdisk2 002fbf7db8392f17 pagingvg active hdisk3 002fbf7db8392ffb pagingvg active hdisk4 002fbf7db83930ea pagingvg active hdisk5 002fbf7db83931d4 pagingvg active hdisk6 002fbf7db83932c2 appservg active hdisk7 002fbf7db83933a6 appservg active hdisk8 002fbf7db839348f appservg active hdisk9 002fbf7db8393579 appservg active hdisk10 002fbf7db839365e appservg active hdisk11 002fbf7db839374e appservg active hdisk12 002fbf7db8393835 appservg active hdisk13 002fbf7db8393921 appservg active hdisk14 002fbf7db8393a18 appservg active hdisk15 002fbf7db8393b04 appservg active hdisk16 002fbf7de1362e64 dbvg active hdisk17 002fbf7de1362f78 dbvg active |
A best way, though, is to use AIX's 'filemon' command. The command can output busiest fs, lv, and pv, AND associate them with processes. The following example show a snapshot of system disk I/O for 10 seconds:
# filemon -O all -o fmon.out; sleep 10; trcstop Enter the "trcstop" command to complete filemon processing [filemon: Reporting started] # [filemon: Reporting completed] [filemon: 10.945 secs in measured interval] # more fmon.out: Sun Dec 4 18:40:14 2005 System: AIX pw101 Node: 5 Machine: 002FBF7D4C00 Cpu utilization: 1.0% Cpu allocation: 100.0% Most Active Files ------------------------------------------------------------------------ #MBs #opns #rds #wrs file volume:inode ------------------------------------------------------------------------ 1.2 4 296 0 unix /dev/hd2:63102 0.7 2 170 0 services /dev/hd4:1173 0.5 242 124 0 group /dev/hd4:600 0.1 8 16 0 ksh.cat /dev/hd2:79522 0.0 6 12 0 default.env /dev/hd9var:8199 .... Most Active Segments ------------------------------------------------------------------------ #MBs #rpgs #wpgs segid segtype volume:inode ------------------------------------------------------------------------ 0.0 0 1 1460347 client Most Active Logical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.00 0 24 1.1 /dev/hd8 jfs2log 0.00 0 8 0.4 /dev/hd9var /var Most Active Physical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.00 0 32 1.5 /dev/hdisk0 N/A ... Detailed Physical Volume Stats (512 byte blocks) ------------------------------------------------------------------------ VOLUME: /dev/hdisk0 description: N/A writes: 4 (0 errs) write sizes (blks): avg 8.0 min 8 max 8 sdev 0.0 write times (msec): avg 5.321 min 2.983 max 7.837 sdev 1.778 write sequences: 4 write seq. lengths: avg 8.0 min 8 max 8 sdev 0.0 seeks: 4 (100.0%) seek dist (blks): init 59548744, avg 4780893.3 min 8 max 7171344 sdev 3380596.4 seek dist (%tot blks):init 41.53385, avg 3.33456 min 0.00001 max 5.00184 sdev 2.35789 time to next req(msec): avg 1471.627 min 6.890 max 4202.723 sdev 1716.529 throughput: 1.5 KB/sec utilization: 0.00 Two things shown here: - 'jfs2log' is causing the most active lv '/dev/hd8' - number of 'seeks' is the same as number of 'writes' or 'seeks' is 100%, a possible indication that file system is highly fragmented. You may need to run 'smit dejfs' to degragment the fs. |
The 'iostat �xc' command can be used to view disk activities on a Solaris machine. The �x argument allows extended statistics to be reported and the �c option reports the CPU utilization for the interval.
Looks like disk 'sd0' is really busy (100% busy!). Next step is to find out what is using it. The fields have the following meanings: disk name of the disk r/s reads per second w/s writes per second Kr/s kilobytes read per second Kw/s kilobytes written per second wait average number of transactions waiting for ser- vice (queue length) actv average number of transactions actively being serviced (removed from the queue but not yet completed) %w percent of time there are transactions waiting for service (queue non-empty) %b percent of time the disk is busy (transactions in progress) # lsof Can give you open files information. If you have better info on Solaris disk I/O analysis please contact us below. |