proc /sys/vm 终结版-Aquester-ChinaUnix博客

博客访问： 8211231
博文数量： 595
博客积分： 13065
博客等级：上将
技术积分： 10334
用户组：普通用户
注册时间： 2008-03-26 16:44

推荐: blog.csdn.net/aquester https://github.com/eyjian https://www.cnblogs.com/aquester http://blog.chinaunix.net/uid/20682147.html

maven（0）
flink（1）
gRPC（2）
go（3）
Kubernetes（1）
微服务（4）

skywalking（3）
Docker（1）
raft（1）
微码分享（2）
一致性协议（1）
iptables（0）
crontab（9）
python（1）
svn（1）
redis（42）
java（4）
json（2）
nginx（1）
海量服务（1）
微信编程（0）
js&html（2）
github（1）
andriod（1）
互联网金融（0）
thrift（10）
推荐转载（5）
原创推荐（16）
平淡生活（22）
生活与设计（3）
hadoop（51）

kafka（3）

hue（1）

hive（1）

hbase（8）

spark（2）

zookeeper（4）

hdfs（13）

storm（1）
有感而发（19）
mooon（28）
下载（1）
TCP/IP（3）
MYSQL（26）
question（4）
linux（89）

LVS（1）

性能（11）

WEB服务器（8）
转载（15）
C/C++（162）

汇编（3）
OO（4）
UML（1）
常用脚本（45）
未分配的博文（10）

echo 1 > /proc/sys/vm/drop_caches

echo 2 > /proc/sys/vm/drop_caches

echo 3 > /proc/sys/vm/drop_caches

0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap plus a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while attempting to use already-allocated memory but will receive errors on memory allocation as appropriate.

阅读(4819) | 评论(0) | 转发(1) |

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

一 block_dump

block_dump enables block I/O debugging when set to a nonzero value. If you want to find out which process caused the disk to spin up(see /proc/sys/vm/laptop_mode ), you can gather information by setting the flag.

二 dirty_background_ratio

Contains, as a percentage of total system memory, the number of pages at which the pdflush background writeback daemon will start writing out dirty data.

三 dirty_expire_centisecs

This tunable is used to define when dirty data is old enough to be eligible for writeout by the pdflush daemons. It is expressed in 100'ths of a second. Data which has been dirty in memory for longer than this interval will be written out next time a pdflush daemon wakes up.

四 dirty_ratio

Contains, as a percentage of total system memory, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.

五 dirty_writeback_centisecs

The pdflush writeback daemons will periodically wake up and write "old" data out to disk. This tunable expresses the interval between those wakeups, in 100'ths of a second.

Setting this to zero disables periodic writeback altogether.

六 drop_caches

Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free.

To free pagecache:

echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes:

echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries and inodes:

echo 3 > /proc/sys/vm/drop_caches

As this is a non-destructive operation, and dirty objects are not freeable, the user should run "sync" first in order to make sure all cached objects are freed.

This tunable was added in 2.6.16.

七 hugepages_treat_as_movable

When a non-zero value is written to this tunable, future allocations for the huge page pool will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about.

八 hugetlb_shm_group

hugetlb_shm_group contains group id that is allowed to create SysV shared memory segment using hugetlb page

九 laptop_mode

十 legacy_va_layout

If non-zero, this sysctl disables the new 32-bit mmap map layout - the kernel will use the legacy (2.4) layout for all processes

十一 lowmem_reserve_ratio

Ratio of total pages to free pages for each memory zone.

十二 max_map_count

This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.

While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation.

The default value is 65536.

十三 min_free_kbytes

This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a pages_min value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size.

十四 mmap_min_addr

This file indicates the amount of address space which a user process will be restricted from mmaping. Since kernel null dereference bugs could accidentally operate based on the information in the first couple of pages of memory userspace processes should not be allowed to write to them.

By default this value is set to 0 and no protections will be enforced by the security module. Setting this value to something like 64k will allow the vast majority of applications to work correctly and provide defense in depth against future potential kernel bugs.

十五 nr_hugepages

nr_hugepages configures number of hugetlb page reserved for the system.

十六 nr_pdflush_threads

The count of currently-running pdflush threads. This is a read-only value.

十七 numa_zonelist_order

This sysctl is only for NUMA. 'Where the memory is allocated from' is controlled by zonelists.

In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following: ZONE_NORMAL -> ZONE_DMA. This means that a memory allocation request for GFP_KERNEL will get memory from ZONE_DMA only when ZONE_NORMAL is not available.

In NUMA case, you can think of following 2 types of order. Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL:

(A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL(B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.

Type(A) offers the best locality for processes on Node(0), but ZONE_DMA will be used before ZONE_NORMAL exhaustion. This increases possibility of out-of-memory (OOM) of ZONE_DMA because ZONE_DMA is tend to be small.

Type(B) cannot offer the best locality but is more robust against OOM of the DMA zone.

Type(A) is called as "Node" order. Type (B) is "Zone" order.

"Node order" orders the zonelists by node, then by zone within each node. Specify "[Nn]ode" for node order.

"Zone Order" orders the zonelists by zone type, then by node within each zone. Specify "[Zz]one" for zone order.

Specify "[Dd]efault" to request automatic configuration. Autoconfiguration will select "node" order in following case:

(1) if the DMA zone does not exist or(2) if the DMA zone comprises greater than 50% of the available memory or(3) if any node's DMA zone comprises greater than 60% of its local memory and the amount of local memory is big enough.

Otherwise, "zone" order will be selected. Default order is recommended unless this is causing problems for your system/application.

十八 overcommit_memory

Controls overcommit of system memory, possibly allowing processes to allocate (but not use) more memory than is actually available.

0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.

1 - Always overcommit. Appropriate for some scientific applications.

十九 overcommit_ratio

Percentage of physical memory size to include in overcommit calculations.

Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100)

swapspace = total size of all swap areasphysmem = size of physical memory in system

二十 page-cluster

page-cluster controls the number of pages which are written to swap in a single attempt. The swap I/O size.

It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc.

The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive.

二十一 panic_on_oom

This enables or disables panic on out-of-memory feature. If this is set to 1, the kernel panics when out-of-memory happens. If this is set to 0, the kernel will kill some rogue process, by calling oom_kill().

Usually, oom_killer can kill rogue processes and system will survive. If you want to panic the system rather than killing rogue processes, set this to 1.

The default value is 0.

二十二 percpu_pagelist_fraction

The batch value of each per cpu pagelist is also updated as a result. It is set to pcp->high / 4. The upper limit of batch is (PAGE_SHIFT * 8).

The initial value is zero. Kernel does not use this value at boot time to set the high water marks for each per cpu page list.

二十三 stat_interval

With this tunable you can configure VM statistics update interval. The default value is 1. This tunable first appeared in 2.6.22 kernel.

二十四 swap_token_timeout

This file contains valid hold time of swap out protection token. The Linux VM has token based thrashing control mechanism and uses the token to prevent unnecessary page faults in thrashing situation. The unit of the value is second. The value would be useful to tune thrashing behavior.

This tunable was removed in 2.6.20 when the algorithm got improved.

(A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
(B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.

(1) if the DMA zone does not exist or
(2) if the DMA zone comprises greater than 50% of the available memory or
(3) if any node's DMA zone comprises greater than 60% of its local memory and the amount of local memory is big enough.

swapspace = total size of all swap areas
physmem = size of physical memory in system