There isn't a computer professional who, at some point, hasn't wondered
whether their system(s) are slow due to legitimate load, or
inefficiency. The beauty is there's no real reason to sit and wonder. In
the case of Linux (and many other operating systems), all of the
information you need is at your fingertips. You just have to know how to
find it.
Computing bottlenecks occur in four basic areas: CPU, RAM, network, and
disk I/O. Linux offers a huge collection of tools for collecting and
viewing information about each. Let's take a look at some useful
techniques, and some of the easier solutions to each area if you find
problems.
CPU Performance InspectionMost new computers today come with
multiple CPUs, or some approximation thereof. Some tools allow you to
view the individual performance of each of these. However, since the
goal here is to measure overall performance, this article focuses on
working with a single CPU value. See the man pages for each command for
whether it offers flags to go further.
One excellent tool for monitoring CPU performance is sar. This program
may not be installed by default on your system, look for the package for your distribution. Typing sar without any arguments gives you something similar to what you'll see in Figure 1.
Figure 1: An example of default sar
output.
From left to right, sar gives you the time the measurement
was taken, which CPU it's reporting on (or in our case, all as a
collective whole), and then the percentage of CPU in use at that time
for:
- %user - User space (non-kernel programs)
- %nice - Programs whose priority had been altered with the nice or renice commands
- %system - Kernel space (the kernel itself plus modules)
- %iowait - Waiting to fulfill a disk I/O request
- %steal - Forced to wait for the hypervisor to finish servicing another virtual CPU, in the case of virtual machines
- %idle - Waiting for new instructions
While all of these columns are interesting, the one that quickly lets you determine if you're CPU-bound is %idle.
In the case of Figure 1, this CPU (or bank of CPUs) is practically at
the beach on vacation. If the numbers were significantly higher, you
would need to consider upgrading the CPU, stopping unnecessary
processes, or moving some of the services off of this computer and onto
another to improve CPU utilization.
RAM Performance InspectionThe nice thing about sar is that you can also use it to look at your memory. When invoked as sar -r, you see something similar to Figure 2.
Figure 2: An example of sar
memory output, invoked with sar -r
.
From left to right, this output tells us the time the sample was taken, and then:
- kbmemfree - Unused memory in kb
- kbmemused - Amount of memory utilized by user space applications in kb
- %memused - The percentage of your RAM currently in use
- kbbuffers - Amount of memory in kb that your kernel is using to buffer data
- kbcached - Amount of memory in kb that the kernel is using to cache data
- kbswpfree - Unused swap space in kb
- kbswpused - Used swap space in kb
- %swpused - The percentage of your swap space currently in use
- kbswpcad - Amount of cached swap in kb
Again, while all of these columns are useful, two give you a quick picture of whether your problem is with memory: %memused, and %swpused. While Figure 1 showed a CPU that was sunning itself in Aruba, %memused shows that this computer is consistently operating at the edge of its RAM capacity. The %swpused
column tells us that on the other hand, the machine isn't being pushed
so hard that it's having to move code from RAM into swap space on the
hard drive. For the timespan shown in the measurements, then, you
aren't experiencing poor performance.
However, don't be alarmed by the fact that this machine looks like it's
one step from having to push things into swap. The kernel's memory
manager will put the most active applications in physical RAM (in ps's STAT column or top's S column you'll see R for running), and the idle applications into swap (in ps or top these will show as S for sleeping), so just the raw percentages of how much RAM and swap you're using don't show the whole picture. Typing ps aux
will let you see how many processes at a particular time are sleeping,
and what percentage of memory (and CPU) each is using. Knowing how
much RAM, how much swap, and how many processes are sleeping, along
with how much RAM these processes are using, will help you better
understand if you're having RAM bottlenecks. Factors such as shared
memory can also make it look like you're using more RAM than you really
are.
The solutions for improving RAM performance are similar to those for
CPU: add more RAM, stop unnecessary programs, or move some of your
services off onto another machine. It's also possible that you're
suffering memory leaks or that something you're running is very
RAM-inefficient. These topics bear further discussion in another
article.
Disk I/O Performance InspectionYet another reason to use sar
is that this Swiss army knife of performance information tools can also
tell you how your drives are doing. Type sar -dp and you'll see something like what's shown in Figure 3.
Figure 3: The beginning of sar
I/O output, invoked with sar -dp
.
This combination of flags shows you information per device, as seeing just the summary information (sar -b)
doesn't give you any real reference points at a glance. From left to
right, this output gives you the time the measurement was taken, as well
as:
- DEV - The physical device in question
- rd_sec/s - Number of sectors (1 sector = 512 bytes) read per second
- wr_sec/s - Number of sectors written per second
- avgrq-sz - Average number of sectors issued to the device
- avgqu-sz - Average queue length of requests issued to the device
- await - Average number of milliseconds I/O requests
for this device had to wait before being handled, including how long it
took to handle them
- svctm - Average time number of milliseconds I/O requests for this device had to wait before being handled
- %util - Percentage of CPU time taken up by I/O requests being issued to the device
Notice in this case that the percentage is not the most interesting value here. Avgqu-sz and svctm
are the two most useful values for determining if you have an
I/O-bound machine. The longer the queue, the more requests are piling
up before they're being serviced. The longer they have to wait before
being serviced, the slower everything gets.
On an I/O-bound machine, solutions include faster drives (including
RAID arrays and other remote storage), organizing your partitions so
that I/O-heavy programs aren't all trying to write to the same physical
drive, and of course splitting off services onto other machines to
spread the load. Very high disk I/O values could in fact mean that
you're using a lot of swap.
Network Performance InspectionWhile sar (as sar -n ALL) can also show you network performance data, in this case it's a bit of overkill. A quick ifconfig (you may need to include the path) can give you some basic information for a quick visual inspect, as shown in Figure 4.
Figure 4: Network information displayed with /sbin/ifconfig
.
The key to understanding this output for performance monitoring purposes is to know that T stands for Transmit and R
stands for Receive. If you see values greater than zero for errors,
dropped, overruns, and collisions, then you may very well have a network
bottleneck problem. The first thing to do is check all of your
connections, and equipment such as switches and hubs. Also, check at a
few different times and see if the problem is persistent. If it
continues, it bears further investigation.
In the case of all four of these issues, this article just skims the
surface of both investigation techniques and solutions. In general,
you'll want to take these measurements multiple times to see if the
problems are persistent or come and go. You might even want to set up
cron jobs to take these measurements on an automatic basis.
Further installments will address the larger issues of monitoring
performance over time, making tweaks that don't involve having to
upgrade hardware, and things developers can do to address performance
issues with their own software.
is a freelance writer, editor, trainer, course developer, and
journalist essentially specializing in helping people better understand
Linux and open source.