Kernel Tuning
(This page is derived from other linuxperf pages and external sources - external references are noted where used)
How to set kernel tunables
The easiest method to set kernel parameters is by modifying the /proc filesystem (and hence the kernel directly) by using echo "value" > /proc/kernel/parameter. Then changes take effect immediately but must be reset at system boot up. Kernel tuning can be automated at boo time by putting the echo commands in /etc/rc.d/rc.local for Non-RedHat distributions or for RedHat derived distributions modify the /etc/sysctl.conf configuration file instead.
Increasing System Limits
Increasing the Maximum number of file handles and the inode cache
NOTE:
On all current versions of linux up to and including 2.2.10 and 2.3.9, inode caches DO NOT SHRINK like the file and dentry caches do when your applications need lots of ram.
This means if you set a really large inode cache, then you can lose a significant amount of RAM over time. On a server machine, this is expected, normal and desired. On a workstation machine, doing a kernel compile when your inode-max is set at a very large number will probably give far too much to the inode cache.
Empirical evidence suggests that 40960 entries in the inode cache will use up to 10 megabytes of ram. Your mileage may vary, and more data is necessary to confirm this number.
Linux 2.0.x - file-max defaults to 1024 so increase the value of /proc/sys/kernel/file-max to something reasonable like 256 for every 4M of RAM you have: i.e.. for a 64 M machine, set it to 4096.
The canonical command to change anything in the /proc hierarchy is (as root) echo "newvalue" >/proc/file/that/you/want/to/change, so for this item the command line is
echo "4096" >/proc/sys/kernel/file-max
Also increase /proc/sys/kernel/inode-max to a value roughly 3 to 4 times the number of open files. This is because the number of inodes open is at least one per open file, and often much larger than that for large files.
(the following was written by Tani Hosokawa)
Note: if you increase this beyond 1024, you may also have to edit include/linux/posix_types.h and increase this line:
#define __FD_SETSIZE 1024
That allows for a select to handle 1024 file descriptors. More than that, and stuff may break.
Linux 2.2.x/2.3.x - increase the value of /proc/sys/fs/file-max to something reasonable like 256 for every 4M of RAM you have: i.e.. for a 64 M machine, set it to 4096. As above, also increase the /proc/sys/fs/inode-max as well
Long Answer:
The above technique or modifying the constants in the kernel sources. Not usually the right answer because that will not survive a new kernel source tree. One of the best techniques is to add the above commands to /etc/rc.d/rc.local.
The exact number will vary from the above formula based on what you are actually doing with the machine. A file server or web server need a lot of open files, for instance, but a compute server does not.
Very large memory systems, especially 512 Megabytes or larger, probably should not have more than 50,000 open files and 150,000 open inodes. Of course if you are Mindcraft, this is a cheap and effective way to waste kernel memory.
Linux 2.4.x - ?
Here is another method of increasing these limits from
Aim: Increase the number of files that may be open simultaneously
Changes to include/linux/fs.h:
increase NR_FILE from 4096 to 65536
increase NR_RESERVED_FILES from 10 to 128
Changes to fs/inode.c:
increase MAX_INODE from 16384 to 262144
Note: MAX_INODE must be at least three times larger than NR_FILE.
Increasing the number of processes/tasks allowed
Linux 2.0.x - The default maximum is 512 tasks, half of which can be used by any single
user. Here's an excerpt from /usr/src/linux/include/linux/tasks.h
#define NR_TASKS 512 /* On x86 Max 4092, or 4090 w/APM configured. */
#define MAX_TASKS_PER_USER (NR_TASKS/2)
#define MIN_TASKS_LEFT_FOR_ROOT 4
Just change the 512 to something higher. You can change MAX_TASKS_PER_USER to
something else as well, although it's a nice precaution against simple process
table attacks. Properly managed systems shouldn't be vulnerable to that
though (you do set your MaxClients and whatnot, don't you?). Don't try to go
above the maximums. Your machine will just keep rebooting and rebooting.
(the preceding was written by Tani Hosokawa)
Linux 2.2.x/2.3.x -Edit /usr/src/linux/include/linux/tasks.h, modify the "NR_TASKS" value and then rebuild and install the kernel. (One person recommended changing NR_TASKS from 512 to 2048, and changing MIN_TASKS_LEFT_FOR_ROOT to 24.)
Linux 2.4.x - ?
Decrease the time before disposing of unused TCP keepalive requests (from linuxraid.org)
Changes to include/net/tcp.h:
decrease TCP_KEEPALIVE_TIME from 2 hours to 5 minutes
Download:
Increase the number of TCP/UDP ports that may be used simultaneously (from linuxraid.org)
On 2.2 and 2.4 kernels, the local port range can be changed via sysctl
echo 1024 25000 > /proc/sys/net/ipv4/ip_local_port_range
Allows more local ports to be available. Generally not a issue, but in a benchmarking scenario you often need more ports available. A common example is clients running `ab` or `http_load` or similar software.
Increasing the amount of memory associated with socket buffers
Increasing the amount of memory associated with socket buffers can often improve performance. Things like NFS in particular, or apache setups with large buffer configured can benefit from this.
echo 262143 > /proc/sys/net/core/rmem_max
echo 262143 > /proc/sys/net/core/rmem_default
This will increase the amount of memory available for socket input queues. The "wmem_*" values do the same for output queues.
Note: With 2.4.x kernels, these values are supposed to "autotune" fairly well, and some people suggest just instead changing the values in:
/proc/sys/net/ipv4/tcp_rmem
/proc/sys/net/ipv4/tcp_wmem
Increasing the amount of memory associated with socket buffers
Increasing the amount of memory associated with socket buffers can often improve performance. Things like NFS in particular, or apache setups with large buffer configured can benefit from this.
echo 262143 > /proc/sys/net/core/rmem_max
echo 262143 > /proc/sys/net/core/rmem_default
This will increase the amount of memory available for socket input queues. The "wmem_*" values do the same for output queues.
Note: With 2.4.x kernels, these values are supposed to "autotune" fairly well, and some people suggest just instead changing the values in:
/proc/sys/net/ipv4/tcp_rmem
/proc/sys/net/ipv4/tcp_wmem
There are three values here, "min default max".
Turning off tcp_sack and tcp_timestamps
These reduce the amount of work the TCP stack has to do:
echo 0 > /proc/sys/net/ipv4/tcp_sack
echo 0 > /proc/sys/net/ipv4/tcp_timestamps
This disables "RFC2018 TCP Selective Acknowledgements", and "RFC1323 TCP timestamps"
Increasing shared memory and ipc limits
Some applications, databases in particular, sometimes need large amounts of SHM segments and semaphores. The default limit for the number of shm segments is 128 for 2.2.
This limit is set in a couple of places in the kernel, and requires a modification of the kernel source and a recompile to increase them.
A sample diff to bump them up:
--- linux/include/linux/sem.h.save Wed Apr 12 20:28:37 2000
+++ linux/include/linux/sem.h Wed Apr 12 20:29:03 2000
@@ -60,7 +60,7 @@
int semaem;
};
-#define SEMMNI 128 /* ? max # of semaphore identifiers */
+#define SEMMNI 512 /* ? max # of semaphore identifiers */
#define SEMMSL 250 /* <= 512 max num of semaphores per id */
#define SEMMNS (SEMMNI*SEMMSL) /* ? max # of semaphores in system */
#define SEMOPM 32 /* ~ 100 max num of ops per semop call */
--- linux/include/asm-i386/shmparam.h.save Wed Apr 12 20:18:34 2000
+++ linux/include/asm-i386/shmparam.h Wed Apr 12 20:28:11 2000
@@ -21,7 +21,7 @@
* Keep _SHM_ID_BITS as low as possible since SHMMNI depends on it and
* there is a static array of size SHMMNI.
*/
-#define _SHM_ID_BITS 7
+#define _SHM_ID_BITS 10
#define SHM_ID_MASK ((1<<_SHM_ID_BITS)-1)
#define SHM_IDX_SHIFT (_SHM_ID_BITS)
Theoretically, the _SHM_ID_BITS can go as high as 11. The rule is that _SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on x86.
In addition to the number of shared memory segments, you can control the maximum amount of memory allocated to shm at run time via the /proc interface. /proc/sys/kernel/shmmax indicates the current. Echo a new value to it to increase it.
echo "67108864" > /proc/sys/kernel/shmmax
To double the default value.
A good resource on this is Tunings The Linux Kernel's Memory. Linux Maximus: How to Get Maximum Performance from Linux and Oracle also includes some useful about tuning shm for oracle, amongst other things.
The best way to see what the current values are, is to issue the command:
ipcs -l
Ptys and ttys
The number of ptys and ttys on a box can sometimes be a limiting factor for things like login servers and database servers.
On Red Hat Linux 7.x, the default limit on ptys is set to 2048 for i686 and athlon kernels. Standard i386 and similar kernels default to 256 ptys.
The config directive CONFIG_UNIX98_PTY_COUNT defaults to 256, but can be set as high as 2048. For 2048 ptys to be supported, the value of UNIX98_PTY_MAJOR_COUNT needs to be set to 8 in include/linux/major.h
With the current device number scheme and allocations, the maximum number of ptys is 2048.
Increasing Thread Limits
Limitations on threads are tightly tied to both file descriptor limits, and process limits.
Under Linux, threads are counted as processes, so any limits to the number of processes also applies to threads. In a heavily threaded app like a threaded TCP engine, or a java server, you can quickly run out of threads.
The first step to increasing the possible number of threads is to make sure you have boosted any process limits as mentioned before.
There are few things that can limit the number of threads, including process limits, memory limits, mutex/semaphore/shm/ipc limits, and compiled in thread limits. For most cases, the process limit is the first one to run into, then the compiled in thread limits, then the memory limits.
To increase the limits, you have to recompile glibc. Oh fun!. And the patch is essentially two lines!. Woohoo!
--- ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h.akl Mon Sep 4
19:37:42 2000
+++ ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h Mon Sep 4
19:37:56 2000
@@ -64,7 +64,7 @@
/* The number of threads per process. */
#define _POSIX_THREAD_THREADS_MAX 64
/* This is the value this implementation supports. */
-#define PTHREAD_THREADS_MAX 1024
+#define PTHREAD_THREADS_MAX 8192
/* Maximum amount by which a process can descrease its asynchronous I/O
priority level. */
--- ./linuxthreads/internals.h.akl Mon Sep 4 19:36:58 2000
+++ ./linuxthreads/internals.h Mon Sep 4 19:37:23 2000
@@ -330,7 +330,7 @@
THREAD_SELF implementation is used, this must be a power of two and
a multiple of PAGE_SIZE. */
#ifndef STACK_SIZE
-#define STACK_SIZE (2 * 1024 * 1024)
+#define STACK_SIZE (64 * PAGE_SIZE)
#endif
/* The initial size of the thread stack. Must be a multiple of PAGE_SIZE.
* */
Now just patch glibc, rebuild, and install it. ;-> If you have a package based system, I seriously suggest making a new package and using it.
Two references on how to do this are Jlinux.org, and Volano.Both describe how to increase the number of threads so Java apps can use them.
Increasing ulimits or shell limits
OK, so this isn't kernel tuning but it may be as issue that you have to deal with, here is how you set the shell security limits up for you application:
"In bash and similar shells you can use these three
commands:
ulimit -a
ulimit -Ha
ulimit -s unlimited
that will respectively print soft limits, hard limits
and remove the stack limit."
The source for this information came from this Usenet article.
Large File Support
Large file support - support for files greater than 2 GB - is a kernel AND user space issue, meaning that not just the kernel has to be able to support file larger than 2 GB but also the C library (libc or for GUN/Linux glibc) and all file accessing utilities have to support it as well. See the LFS section in the links page for links with detailed information.
Improving System Performance
Tuning (delaying) filesystem cache synchronization (flushing)
Increasing the time between when the kernel writes will minimize the amount of I/O done at the cost of losing more data if the system were to crash. See the following link from the linuxdoc.org site from the Securing and Optimizing Red Hat Linux Edition.
Tuning virtual memory system to use less memory on servers with *lots* of memory on 2.0.x or 2.2.x (from Tani Hosokawa)
Memory shortages (even though you've got tons) - for Linux 2.0.x/2.2.x
Sometimes, you'll end up with a situation where the kernel can't seem to find
enough memory to load a program for you, even though you've got tons of
memory. This may be caused by the filesystem buffers using up the extra, and
not having enough memory immediately available. You can often fix this by
modifying the contents of /proc/sys/vm/freepages (the three values are
min_free_pages, free_pages_low, and free_pages_high in case you care -- check
the source for more details). "256 512 768" is common, but often not enough.
I use "1024 2048 3072" usually. That's almost definitely enough memory to load
anything, and with 384 megs of RAM, it's not going to hurt performance by
reducing the amount available for caching.
呵呵,看了一上午,感谢弱智兄,关于linux optimizing 的好文章
补充:
Securing and Optimizing Linux: RedHat Edition -A Hands on Guide
Prev Chapter 6. Linux General Optimization Next
--------------------------------------------------------------------------------
6.5. The bdflush parameters
The bdflush file is closely related to the operation of the virtual memory VM subsystem of the Linux kernel and has a little influence on disk usage. This file /proc/sys/vm/bdflush controls the operation of the bdflush kernel daemon. We generally tune this file to improve file system performance. By changing some values from the default as shown below, the system seems more responsive; e.g. it waits a little more to write to disk and thus avoids some disk access contention.
The default setup for the bdflush parameters under Red Hat Linux is: "40 500 64 256 500 3000 500 1884 2" To change the values of bdflush, type the following command on your terminal:
[root@deep] /# echo "100 1200 128 512 15 5000 500 1884 2">/proc/sys/vm/bdflush
You may add the above commands to the /etc/rc.d/rc.local script file and you'll not have to type it again the next time you reboot your system.
Edit the /etc/sysctl.conf file and add the following line: # Improve file system performance
vm.bdflush = 100 1200 128 512 15 5000 500 1884 2
You must restart your network for the change to take effect. The command to manually restart the network is the following: [root@deep] /# /etc/rc.d/init.d/network restart
Setting network parameters [ OK ] Bringing up interface lo [ OK ] Bringing up interface eth0 [ OK ] Bringing up interface eth1 [ OK ]
In our example above, according to the/usr/src/linux/Documentation/sysctl/vm.txt file-
The first parameter 100 %
governs the maximum number of dirty buffers in the buffer cache. Dirty means that the contents of the buffer still have to be written to disk as opposed to a clean buffer, which can just be forgotten about. Setting this to a high value means that Linux can delay disk writes for a long time, but it also means that it will have to do a lot of I/O at once when memory becomes short. A low value will spread out disk I/O more evenly.
The second parameter 1200 ndirty
This gives the maximum number of dirty buffers that bdflush can write to the disk in one time. A high value will mean delayed, bursty I/O, while a small value can lead to memory shortage when bdflush isn't woken up often enough.
The third parameter 128 nrefill
This is the number of buffers that bdflush will add to the list of free buffers when refill_freelist() is called. It is necessary to allocate free buffers beforehand, since the buffers often are of a different size than memory pages and some bookkeeping needs to be done beforehand. The higher the number, the more memory will be wasted and the less often refill_freelist() will need to run.
refill_freelist() 512
When this comes across more than nref_dirt dirty buffers, it will wake up bdflush.
age_buffer 50*HZ, age_super parameters 5*HZ
Finally, the age_buffer 50*HZ and age_super parameters 5*HZ govern the maximum time Linux waits before writing out a dirty buffer to disk. The value is expressed in jiffies (clockticks); the number of jiffies per second is 100. Age_buffer is the maximum age for data blocks, while age_super is for file system metadata.
The fifth 15 and the last two parameters 1884 and 2
These are unused by the system so we don't need to change the default ones.
: Look at /usr/src/linux/Documentation/sysctl/vm.txt for more information on how to improve kernel parameters related to virtual memory.
阅读(1310) | 评论(0) | 转发(0) |