Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1132644
  • 博文数量: 254
  • 博客积分: 1242
  • 博客等级: 少尉
  • 技术积分: 1581
  • 用 户 组: 普通用户
  • 注册时间: 2012-05-03 21:49
文章分类

全部博文(254)

文章存档

2017年(16)

2016年(4)

2013年(94)

2012年(140)

分类:

2013-06-27 23:56:34

文件: Linux Performance and Tuning Guidelines.pdf
大小: 4224KB
下载: 下载
 
    Some of the default kernel parameters for system performance are geared more towards workstation performance that file server/large disk I/O type of operations.  For example, for pure file server applications like web and samba servers, you probably want to disable the "atime" option on the most used filesystem. This disabled updating the "atime" value for the file, which indicates that the last time a file was accessed. Since this info isnt very useful in this situation, and causes extra disk hits, its typically disabled. To do this, just edit /etc/fstab and add "notime" as a mount option for the filesystem.

    For example:

      /dev/rd/c0d0p3          /test                    ext2    noatime        1 2

    The is another kernel tuneable that can be tweaked for improved disk i/o in some cases.

    1. For fast disk subsystems, it is desirable to use large flushes of dirty memory pages.  

      The value stored in /proc/sys/vm/dirty_background_ratio defines at what percentage of main memory the pdflush daemon should write data out to the disk.

      If larger flushes are desired then increasing the default value of 10% to a larger value will cause less frequent flushes.

      As in the example above the value can be changed to 25 as shown in

       # sysctl -w vm.dirty_background_ratio=25
       

    2. Another related setting in the virtual memory subsystem is the ratio at which dirty pages created by application disk writes will be flushed out to disk.

      The default value 10 means that data will be written into system memory until the file system cache has a size of 10% of the server’s RAM.

      The ratio at which dirty pages are written to disk can be altered as follows to a setting of 20% of the system memory

      # sysctl -w vm.dirty_ratio=20

    SCSI tuning is highly dependent on the particular scsi cards and drives in questions. The most effective variable when it comes to SCSI card performance is tagged command queuing.

    For the Adaptec aic7xxx series cards (2940's, 7890's, *160's, etc) this can be enabled with a module option like:

    	aic7xx=tag_info:{{0,0,0,0,}}

    This enabled the default tagged command queing on the first device, on the first 4 scsi ids.

      	options aic7xxxaic7xxx=tag_info:{{24.24.24.24.24.24}}
      in /etc/modules.conf will set the TCQ depth to 24

    You probably want to check the driver documentation for your particular scsi modules for more info.

    Most benchmarks benefit heavily from making sure the NIC's in use are well supported, with a well written driver. Examples include eepro100, tulip's, newish 3com cards, and acenic and sysconect gigabit cards.

    Making sure the cards are running in full duplex mode is also very often critical to benchmark performance. Depending on the networking hardware used, some of the cards may not autosense properly and may not run full duplex by default.

    Many cards include module options that can be used to force the cards into full duplex mode. Some examples for common cards include

    alias eth0 eepro100
    options eepro100 full_duplex=1
    alias eth1 tulip
    options tulip full_duplex=1

    Though full duplex gives the best overall performance, I've seen some circumstances where setting the cards to half duplex will actually increase thoughput, particulary in cases where the data flow is heavily one sided.

    If you think your in a situation where that may help, I would suggest trying it and benchmarking it.

    For servers that are serving up huge numbers of concurrent sessions, there are some tcp options that should probably be enabled. With a large # of clients doing their best to kill the server, its probably not uncommon for the server to have 20000 or more open sockets.

    In order to optimize TCP performance for this situation, I would suggest tuning the following parameters.

    echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range
    Allows more local ports to be available. Generally not a issue, but in a benchmarking scenario you often need more ports available. A common example is clients running `ab` or `http_load` or similar software.

    In the case of firewalls, or other servers doing NAT or masquerading, you may not be able to use the full port range this way, because of the need for high ports for use in NAT.

    Increasing the amount of memory associated with socket buffers can often improve performance. Things like NFS in particular, or apache setups with large buffer configured can benefit from this.

    echo 262143 > /proc/sys/net/core/rmem_max
    echo 262143 > /proc/sys/net/core/rmem_default
    This will increase the amount of memory available for socket input queues. The "wmem_*" values do the same for output queues.

    Note: With 2.4.x kernels, these values are supposed to "autotune" fairly well, and some people suggest just instead changing the values in:

    /proc/sys/net/ipv4/tcp_rmem
    /proc/sys/net/ipv4/tcp_wmem
    There are three values here, "min default max".

    These reduce the amount of work the TCP stack has to do, so is often helpful in this situation.

    echo 0 > /proc/sys/net/ipv4/tcp_sack
    echo 0 > /proc/sys/net/ipv4/tcp_timestamps 
See also 

But the basic tuning steps include:

Try using NFSv3 if you are currently using NFSv2. There can be very significant performance increases with this change.

Increasing the read write block size. This is done with the rsize and wsize mount options. They need to the mount options used by the NFS clients. Values of 4096 and 8192 reportedly increase performance alot. But see the notes in the HOWTO about experimenting and measuring the performance implications. The limits on these are 8192 for NFSv2 and 32768 for NFSv3

Another approach is to increase the number of nfsd threads running. This is normally controlled by the nfsd init script. On Red Hat Linux machines, the value "RPCNFSDCOUNT" in the nfs init script controls this value. The best way to determine if you need this is to experiment. The HOWTO mentions a way to determin thread usage, but that doesnt seem supported in all kernels.

Another good tool for getting some handle on NFS server performance is `nfsstat`. This util reads the info in /proc/net/rpc/nfs[d] and displays it in a somewhat readable format. Some info intended for tuning Solaris, but useful for it's description of the

See also the

documentation.

Bumping the number of available httpd processes

    Apache sets a maximum number of possible processes at compile time. It is set to 256 by default, but in this kind of scenario, can often be exceeded.

    To change this, you will need to chage the hardcoded limit in the apache source code, and recompile it. An example of the change is below:

    --- apache_1.3.6/src/include/httpd.h.prezab     Fri Aug  6 20:11:14 1999
    +++ apache_1.3.6/src/include/httpd.h    Fri Aug  6 20:12:50 1999
    @@ -306,7 +306,7 @@
      * the overhead.
      */
     #ifndef HARD_SERVER_LIMIT
    -#define HARD_SERVER_LIMIT 256
    +#define HARD_SERVER_LIMIT 4000
     #endif
    
     /*
    

    To make useage of this many apache's however, you will also need to boost the number of processes support, at least for 2.2 kernels. See the for info on increasing this.

The biggest scalability problem with apache, 1.3.x versions at least, is it's model of using one process per connection. In cases where there large amounts of concurent connections, this can require a large amount resources. These resources can include RAM, schedular slots, ability to grab locks, database connections, file descriptors, and others.

In cases where each connection takes a long time to complete, this is only compunded. Connections can be slow to complete because of large amounts of cpu or i/o usage in dynamic apps, large files being transfered, or just talking to clients on slow links.

There are several strategies to mitigate this. The basic idea being to free up heavyweight apache processes from having to handle slow to complete connections.

Static Content Servers

    If the servers are serving lots of static files (images, videos, pdf's, etc), a common approach is to serve these files off a dedicated server. This could be a very light apache setup, or any many cases, something like thttpd, boa, khttpd, or TUX. In some cases it is possible to run the static server on the same server, addressed via a different hostname.

    For purely static content, some of the other smaller more lightweight web servers can offer very good performance. They arent nearly as powerful or as flexible as apache, but for very specific performance crucial tasks, they can be a big win.

      Boa:
      thttpd:
      mathopd:
       

    If you need even more ExtremeWebServerPerformance, you probabaly want to take a look at TUX, written by . This is the current world record holder for . It probabaly owns the right to be called the worlds fastest web server.

Proxy Usage
    For servers that are serving dynamic content, or ssl content, a better approach is to employ a reverse-proxy. Typically, this would done with either apache's mod_proxy, or Squid. There can be several advantages from this type of configuration, including content caching, load balancing, and the prospect of moving slow connections to lighter weight servers.

    The easiest approache is probabaly to use mod_proxy and the "ProxyPass" directive to pass content to another server. mod_proxy supports a degree of caching that can offer a significant performance boost. But another advantage is that since the proxy server and the web server are likely to have a very fast interconnect, the web server can quickly serve up large content, freeing up a apache process, why the proxy slowly feeds out the content to clients. This can be further enhanced by increasing the amount of socket buffer memory thats for the kernel. See the for info on this.

    proxy links


    •  

    •  

    •  

    •  

ListenBacklog

    One of the most frustrating thing for a user of a website, is to get "connection refused" error messages. With apache, the common cause of this is for the number of concurent connections to exceed the number of available httpd processes that are available to handle connections.

    The apache ListenBacklog paramater lets you specify what backlog paramater is set to listen(). By default on linux, this can be as high as 128.

    Increasing this allows a limited number of httpd's to handle a burst of attempted connections.

There are some experimental patches from SGI that accelerate apache. More info at:

I havent really had a chance to test the SGI patches yet, but I've been told they are pretty effective.

Samba Tuning

    Some applications, databases in particular, sometimes need large amounts of SHM segments and semaphores. This tuning is well explained in Oracle documentation.
    Lies, damn lies, and statistics.

    But aside from that, a good set of benchmarking utilities are often very helpful in doing system tuning work. It is impossible to duplicate "real world" situations, but that isnt really the goal of a good benchmark. A good benchmark typically tries to measure the performance of one particular thing very accurately. If you understand what the benchmarks are doing, they can be very useful tools.

    Some of the common and useful benchmarks include:

      Bonnie
        has been around forever, and the numbers it produces are meaningful to many people. If nothing else, it's good tool for producing info to share with others. This is a pretty common utility for testing driver performance. It's only drawback is it sometimes requires the use of huge datasets on large memory machines to get useful results, but I suppose that goes with the territory.

        Check for more info on Bonnie. There is also a somwhat newer version of Bonnie called that fixes a few bugs, and includes a couple of extra tests.

      Dbench
        My personal favorite disk io benchmarking utility is `dbench`. It is designed to simulate the disk io load of a system when running the NetBench benchmark suite. It seems to do an excellent job at making all the drive lights blink like mad. Always a good sign.

        Dbench is available at

      http_load
        A nice simple http benchmarking app, that does integrity checking, parallel requests, and simple statistics. Generates load based off a test file of urls to hit, so it is flexible.

        http_load is available from

      dkftpbench
        A (the?) ftp benchmarking utility. Designed to simulate real world ftp usage (large number of clients, throttles connections to modem speeds, etc). Handy. Also includes the useful dklimits utility .

        dkftpbench is available from

         

      tiobench
        A multithread disk io benchmarking utility. Seems to do an a good job at pounding on the disks. Comes with some useful scripts for generating reports and graphs.

        The .

      dt

        dt does a lot. disk io, process creation, async io, etc.

        dt is available at

         

      ttcp
        A tcp/udp benchmarking app. Useful for getting an idea of max network bandwidth of a device. Tends to be more accurate than trying to guestimate with ftp or other protocols.
      netperf
        Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirecitonal throughput, and end-to-end latency. The environments currently measureable by netperf include: TCP and UDP via BSD Sockets, DLPI, Unix Domain Sockets, Fore ATM API, HiPPI.

        Info:
        Download:

        Info provided by Bill Hilf.

      httperf
        httperf is a popular web server benchmark tool for measuring web server performance. It provides a flexible facility for generating various HTTP workloads and for measuring server performance. The focus of httperf is not on implementing one particular benchmark but on providing a robust, high-performance tool that facilitates the construction of both micro- and macro-level benchmarks. The three distinguishing characteristics of httperf are its robustness, which includes the ability to generate and sustain server overload, support for the HTTP/1.1 protocol, and its extensibility to new workload generators and performance measurements.

        Info:
        Download:

        Info provided by Bill Hilf.

      Autobench
        Autobench is a simple Perl script for automating the process of benchmarking a web server (or for conducting a comparative test of two different web servers). The script is a wrapper around httperf. Autobench runs httperf a number of times against each host, increasing the number of requested connections per second on each iteration, and extracts the significant data from the httperf output, delivering a CSV or TSV format file which can be imported directly into a spreadsheet for analysis/graphing.

        Info:
        Download:

        Info provided by Bill Hilf.

    General benchmark Sites
    Standard, and not so standard system monitoring tools that can be useful when trying to tune a system.
      vmstat
        This util is part of the procps package, and can provide lots of useful info when diagnosing performance problems.

        Heres a sample vmstat output on a lightly used desktop:

           procs                      memory    swap          io     system  cpu
         r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy id
         1  0  0   5416   2200   1856  34612   0   1     2     1  140   194   2   1 97
        

        And heres some sample output on a heavily used server:

           procs                      memory    swap          io     system  cpu
         r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy id
        16  0  0   2360 264400  96672   9400   0   0     0     1   53    24   3   1 96
        24  0  0   2360 257284  96672   9400   0   0     0     6 3063 17713  64  36 0
        15  0  0   2360 250024  96672   9400   0   0     0     3 3039 16811  66  34 0
        

        The interesting numbers here are the first one, this is the number of the process that are on the run queue. This value shows how many process are ready to be executed, but can not be ran at the moment because other process need to finish. For lightly loaded systems, this is almost never above 1-3, and numbers consistently higher than 10 indicate the machine is getting pounded.

        Other interseting values include the "system" numbers for in and cs. The in value is the number of interupts per second a system is getting. A system doing a lot of network or disk I/o will have high values here, as interupts are generated everytime something is read or written to the disk or network.

        The cs value is the number of context switches per second. A context switch is when the kernel has to take off of the executable code for a program out of memory, and switch in another. It's actually _way_ more complicated than that, but thats the basic idea. Lots of context swithes are bad, since it takes some fairly large number of cycles to performa a context swithch, so if you are doing lots of them, you are spending all your time chaining jobs and not actually doing any work. I think we can all understand that concept.

      netstat

        Since this document is primarily concerned with network servers, the `netstat` command can often be very useful. It can show status of all incoming and outgoing sockets, which can give very handy info about the status of a network server.

        One of the more useful options is:

                netstat -pa

        The `-p` options tells it to try to determine what program has the socket open, which is often very useful info. For example, someone nmap's their system and wants to know what is using port 666 for example. Running netstat -pa will show you its satand running on that tcp port.

        One of the most twisted, but useful invocations is:

        netstat -a -n|grep -E "^(tcp)"| cut -c 68-|sort|uniq -c|sort -n
        

        This will show you a sorted list of how many sockets are in each connection state. For example:

              9  LISTEN      
             21  ESTABLISHED 

      ps

        Okay, so everyone knows about ps. But I'll just highlight one of my favorite options:
        ps -eo pid,%cpu,vsz,args,wchan
        

        Shows every process, their pid, % of cpu, memory size, name, and what syscall they are currently executing. Nifty.

 
 
 
 
 
 
 
 
 

Pmap - - Process Memory Usage

The command pmap report memory map of a process. Use this command to find out causes of memory bottlenecks.

# pmap -d PID

Iptraf - Real-time Network Statistics

Features of Iptraf are

Network traffic statistics by TCP connection
IP traffic statistics by network interface
Network traffic statistics by protocol
Network traffic statistics by TCP/UDP port and by packet size
Network traffic statistics by Layer2 address

 

 

 

 

You've just had your first cup of coffee and have received that dreaded phone call. The system is slow. What are you going to do? This article will discuss performance bottlenecks and optimization in Red Hat Enterprise Linux (RHEL5).

Before getting into any monitoring or tuning specifics, you should always use some kind of tuning methodology. This is one which I've used successfully through the years:

1. Baseline – The first thing you must do is establish a baseline, which is a snapshot of how the system appears when it's performing well. This baseline should not only compile data, but also document your system's configuration (RAM, CPU and I/O). This is necessary because you need to know what a well-performing system looks like prior to fixing it.

2. Stress testing and monitoring – This is the part where you monitor and stress your systems at peak workloads. It's the monitoring which is key here – as you cannot effectively tune anything without some historic trending data.
 

3. Bottleneck identification – This is where you come up with the diagnosis for what is ailing your system. The primary objective of section 2 is to determine the bottleneck. I like to use several monitoring tools here. This allows me to cross-reference my data for accuracy.
 

4. Tune – Only after you've identified the bottleneck can you tune it.
 

5. Repeat – Once you've tuned it, you can start the cycle again – but this time start from step 2 (monitoring) – as you already have your baseline.

It's important to note that you should only make one change at a time. Otherwise, you'll never know exactly what impacted any changes which might have occurred. It is only by repeating your tests and consistently monitoring your systems that you can determine if your tuning is making an impact.

RHEL monitoring tools

Before we can begin to improve the performance of our system, we need to use the monitoring tools available to us to baseline. Here are some monitoring tools you should consider using:

This tool (made available in RHEL5) utilizes the processor to retrieve kernel system information about system executables. It allows one to collect samples of performance data every time a counter detects an interrupt. I like the tool also because it carries little overhead – which is very important because you don't want monitoring tools to be causing system bottlenecks. One important limitation is that the tool is very much geared towards finding problems with CPU limited processes. It does not identify processes which are sleeping or waiting on I/O.

The steps used to start up Oprofile include setting up the profiler, starting it and then dumping the data.

First we'll set up the profile. This option assumes that one wants to monitor the kernel.

# opcontrol --setup –vmlinux=/usr/lib/debug/lib/modules/'uname -r'/vmlinux

Then we can start it up.

# opcontrol --start

Finally, we'll dump the data.

# opcontrol --stop/--shutdown/--dump

This tool (introduced in RHEL5) collects data by analyzing the running kernel. It really helps one come up with a correct diagnosis of a performance problem and is tailor-made for developers. SystemTap eliminates the need for the developer to go through the recompile and reinstallation process to collect data.

This is another tool which was introduced by Red Hat in RHEL5. What does it do for you? It allows both developers and system administrators to monitor running processes and threads. Frysk differs from Oprofile in that it uses 100% reliable information (similar to SystemTap) - not just a sampling of data. It also runs in user mode and does not require kernel modules or elevated privileges. Allowing one to stop or start running threads or processes is also a very useful feature.

Some more general Linux tools include and . While these are considered more basic, often I find them much more useful than more complex tools. Certainly they are easier to use and can help provide information in a much quicker fashion.

Top provides a quick snapshot of what is going on in your system – in a friendly character-based display.

It also provides information on CPU, Memory and Swap Space.

Let's look at vmstat – one of the oldest but more important Unix/Linux tools ever created. Vmstat allows one to get a valuable snapshot of process, memory, sway I/O and overall CPU utilization. 

Now let's define some of the fields:

Memory
swpd – The amount of virtual memory
free – The amount of free memory
buff – Amount of memory used for buffers
cache – Amount of memory used as page cache

Process
r – number of run-able processes
b – number or processes sleeping.
Make sure this number does not exceed the amount of run-able processes, because when this condition occurs it usually signifies that there are performance problems.

Swap
si – the amount of memory swapped in from disk
so – the amount of memory swapped out.

This is another important field you should be monitoring – if you are swapping out data, you will likely be having performance problems with virtual memory.

CPU
us – The % of time spent in user-level code.
It is preferable for you to have processes which spend more time in user code rather than system code. Time spent in system level code usually means that the process is tied up in the kernel rather than processing real data.
sy – the time spent in system level code
id – the amount of time the CPU is idle wa – The amount of time the system is spending waiting for I/O.

 

If your system is waiting on I/O – everything tends to come to a halt. I start to get worried when this is > 10.

 

There is also:

Free – This tool provides memory information, giving you data around the total amount of free and used physical and swap memory.
 

Now that we've analyzed our systems – lets look at what we can do to optimize and tune our systems.

CPU Overhead – Shutting Running Processes
Linux starts up all sorts of processes which are usually not required. This includes processes such as autofs, cups, xfs, nfslock and sendmail. As a general rule, shut down anything that isn't explicitly required. How do you do this? The best method is to use the chkconfig command.

Here's how we can shut these processes down.
[root ((Content component not found.)) _29_140_234 ~]# chkconfig --del xfs

You can also use the GUI - /usr/bin/system-config-services to shut down daemon process.

Tuning the kernel
To tune your kernel for optimal performance, start with:

sysctl – This is the command we use for changing kernel parameters. The parameters themselves are found in /proc/sys/kernel

Let's change some of the parameters. We'll start with the msgmax parameter. This parameter specifies the maximum allowable size of a single message in an IPC message queue. Let's view how it currently looks.

[root ((Content component not found.)) _29_139_52 ~]# sysctl kernel.msgmax
kernel.msgmax = 65536
[root ((Content component not found.)) _29_139_52 ~]#

There are three ways to make these kinds of kernel changes. One way is to change this using the echo command.

[root ((Content component not found.)) _29_139_52 ~]# echo 131072 >/proc/sys/kernel/msgmax
[root ((Content component not found.)) _29_139_52 ~]# sysctl kernel.msgmax
kernel.msgmax = 131072
[root ((Content component not found.)) _29_139_52 ~]#

Another parameter that is changed quite frequently is SHMMAX, which is used to define the maximum size (in bytes) for a shared memory segment. In Oracle this should be set large enough for the largest SGA size. Let's look at the default parameter:

# sysctl kernel.shmmax
kernel.shmmax = 268435456

This is in bytes – which translates to 256 MG. Let's change this to 512 MG, using the -w flag.

[root ((Content component not found.)) _29_139_52 ~]# sysctl -w kernel.shmmax=5368709132
kernel.shmmax = 5368709132
[root ((Content component not found.)) _29_139_52 ~]#

The final method for making changes is to use a text editor such as vi – directly editing the /etc/sysctl.conf file to manually make our changes.

To allow the parameter to take affect dynamically without a reboot, issue the sysctl command with the -p parameter.

Obviously, there is more to performance tuning and optimization than we can discuss in the context of this small article – entire books have been written on Linux performance tuning. For those of you first getting your hands dirty with tuning, I suggest you tread lightly and spend time working on development, test and/or sandbox environments prior to deploying any changes into production. Ensure that you monitor the effects of any changes that you make immediately; it's imperative to know the effect of your change. Be prepared for the possibility that fixing your bottleneck has created another one. This is actually not a bad thing in itself, as long as your overall performance has improved and you understand fully what is happening.

Performance monitoring and tuning is a dynamic process which does not stop after you have fixed a problem. All you've done is established a new baseline. Don't rest on your laurels, and understand that performance monitoring must be a routine part of your role as a systems administrator.

About the author: Ken Milberg is a systems consultant with two decades of experience working with Unix and Linux systems. He is a SearchEnterpriseLinux.com Ask the Experts advisor and columnist.

 
 
 
 
 
 
 
 
 
 
 
 

    Check out the "c10k problem" page in particular, but the entire site has _lots_ of useful tuning info.

    Site organized by Rik Van Riel and a few other folks. Probabaly the best linux specific system tuning page.

    Linux Scalibity Project at Umich.

    Info on tuning linux kernel NFS in particular, and linux network and disk io in general

    Linux Performance Tuning Checklist. Some useful content.

    Miscelaneous performace tuning tips at linux.com

    Summary of tcp tuning info
阅读(1625) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~