When Linux Runs Out of Memory-sjhf-ChinaUnix博客

LINUX系统维护与管理data.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

sjhf

博客访问： 10998683
博文数量： 2905
博客积分： 20098
博客等级：上将
技术积分： 36298
用户组：普通用户
注册时间： 2009-03-23 05:00

文章分类

全部博文（2905）

其他类别（0）
其他（0）
linux（2900）

linux未整理14（77）

linux未整理13（89）

linux未整理12（88）

linux未整理11（81）

linux未整理10（76）

linux未整理9（89）

linux未整理8（84）

linux未整理7（78）

linux未整理6（77）

linux未整理5（85）

linux未整理4（78）

linux未整理3（67）

linux未整理2（73）

linux未整理1（74）

linux未整理（40）

linux基本设置（37）

LINUX源代码（5）

大话LINUX（2）

LINUX心得体会（8）

linux TELNET（5）

linux内核（34）

linux文件磁盘管（109）

linux系统安装（19）

linux认证/考试（40）

linux LDAP（5）

linux权限管理（20）

linux进程管理（21）

linux防火墙（26）

LINUX与WINDOWS（85）

linux APACHE配置（38）

LINUX设备/驱动（41）

linux GRUB（12）

linux安全管理（143）

linux虚拟化（30）

linux DNS（33）

linux网络配置（83）

linux MYSQL管理（38）

linux Oracle管理（28）

linux数据库管理（2）

linux问与答（69）

linux基础入门（128）

linux SHELL（37）

linux NIS（6）

linux DHCP（23）

linux WEB（75）

linux NFS（10）

linux FTP（39）

linux SAMBA（35）

linux SSH（23）

LINUX服务配置（5）

LINUX命令使用（174）
未分配的博文（5）

文章存档

2012年（1）

2011年（3）

2009年（2901）

我的朋友

Exploring OOM

To begin exploring OOM, first type and run this code snippet that allocates huge blocks of memory:

#include 
#include 

#define MEGABYTE 1024*1024

int main(int argc, char *argv[])
{
        void *myblock = NULL;
        int count = 0;

        while (1)
        {
                myblock = (void *) malloc(MEGABYTE);
                if (!myblock) break;
                printf("Currently allocating %d MB\n", ++count);
        }
        
        exit(0);
}

Compile the program, run it, and wait for a moment. Sooner or later it will go OOM. Now compile the next program, which allocates huge blocks and fills them with 1:

#include 
#include 

#define MEGABYTE 1024*1024

int main(int argc, char *argv[])
{
        void *myblock = NULL;
        int count = 0;

        while(1)
        {
                myblock = (void *) malloc(MEGABYTE);
                if (!myblock) break;
                memset(myblock,1, MEGABYTE);
                printf("Currently allocating %d MB\n",++count);
        }
        exit(0);
        
}

Notice the difference? Likely, program A allocates more memory blocks than program B does. It's also obvious that you will see the word "Killed" not too long after executing program B. Both programs end for the same reason: there is no more space available. More specifically, program A ends gracefully because of a failed malloc(). Program B ends because of the Linux kernel's so-called OOM killer.

The first fact to observe is the amount of allocated blocks. Assume that you have 256MB of RAM and 888MB of swap (my current Linux settings). Program B ended at:

Currently allocating 1081 MB

On the other hand, program A ended at:

Currently allocating 3056 MB

Where did A get that extra 1975MB? Did I cheat? Of course not! If you look closer on both listings, you will find out that program B fills the allocated memory space with 1s, while A merely simply allocates without doing anything. This happens because Linux employs deferred page allocation. In other words, allocation doesn't actually happen until the last moment you really use it; for example, by writing data to the block. So, unless you touch the block, you can keep asking for more. The technical term for this is optimistic memory allocation.

Checking /proc//status on both programs will reveal the facts. Here's program A:

$ cat /proc//status
VmPeak:  3141876 kB
VmSize:  3141876 kB
VmLck:         0 kB
VmHWM:     12556 kB
VmRSS:     12556 kB
VmData:  3140564 kB
VmStk:        88 kB
VmExe:         4 kB
VmLib:      1204 kB
VmPTE:      3072 kB

Here's program B, shortly before the OOM killer struck:

$ cat /proc//status 
VmPeak:  1072512 kB
VmSize:  1072512 kB
VmLck:         0 kB
VmHWM:    234636 kB
VmRSS:    204692 kB
VmData:  1071200 kB
VmStk:        88 kB
VmExe:         4 kB
VmLib:      1204 kB
VmPTE:      1064 kB

VmRSS deserves further explanation. RSS stands for "Resident Set Size." It explains how many of the allocated blocks owned by the task currently reside in RAM. Also note that before B reaches OOM, swap usage is almost 100 percent (most of the 888MB), while A uses no swap at all. It's clear that malloc() itself did nothing more than just preserve a memory area, nothing else.

Another question also arises. "Even without touching the pages, why is the allocation limit 3056MB?" This exposes an unseen limit. For every application in a 32-bit system, there is 4GB of address space available for usage. The Linux kernel usually splits the linear address to provide 0 to 3GB for user space and 3GB to 4GB for kernel space. User space is a room where a task can do anything it wants, while kernel space is solely for the kernel. If you try to cross this 3GB border, you will get a segmentation fault.

(Side note: There is a kernel patch that gives the whole 4GB to userspace, at the cost of some context-switching.)

The conclusion is that OOM happens for two technical reasons:

No more pages are available in the VM.
No more user address space is available.
Both #1 and #2.

Thus the strategies to prevent those circumstances are:

Know how large the user address space is.
Know how many pages are available.

When you ask for a memory block, usually by using malloc(), you're asking the runtime C library whether a preallocated block is available. This block's size must at least equal the user request. If there is already a memory block available, malloc() will assign this block to the user and mark it as "used." Otherwise, malloc() must allocate more memory by extending the heap. All requested blocks go in an area called the heap. Do not confuse it with the stack, because the stack stores local variable and function return addresses. These two sections have different jobs.

Where is the heap located in the address space? The process address map can tell you exactly where:

$ cat /proc/self/maps
0039d000-003b2000 r-xp 00000000 16:41 1080084    /lib/ld-2.3.3.so
003b2000-003b3000 r-xp 00014000 16:41 1080084    /lib/ld-2.3.3.so
003b3000-003b4000 rwxp 00015000 16:41 1080084    /lib/ld-2.3.3.so
003b6000-004cb000 r-xp 00000000 16:41 1080085    /lib/tls/libc-2.3.3.so
004cb000-004cd000 r-xp 00115000 16:41 1080085    /lib/tls/libc-2.3.3.so
004cd000-004cf000 rwxp 00117000 16:41 1080085    /lib/tls/libc-2.3.3.so
004cf000-004d1000 rwxp 004cf000 00:00 0
08048000-0804c000 r-xp 00000000 16:41 130592     /bin/cat
0804c000-0804d000 rwxp 00003000 16:41 130592     /bin/cat
0804d000-0806e000 rwxp 0804d000 00:00 0          [heap]
b7d95000-b7f95000 r-xp 00000000 16:41 2239455    /usr/lib/locale/locale-archive
b7f95000-b7f96000 rwxp b7f95000 00:00 0
b7fa9000-b7faa000 r-xp b7fa9000 00:00 0          [vdso]
bfe96000-bfeab000 rw-p bfe96000 00:00 0          [stack]

This is an actual address space layout shown for cat, but you may get different results. It is up to the Linux kernel and the runtime C library to arrange them. Notice that recent Linux kernel versions (2.6.x) kindly label the memory area, but don't completely rely on them.

The heap is basically free space not already given for program mapping and stack; thus, it narrows down the available address space. It's not a full 3GB, but it's 3GB minus everything else that's mapped. The bigger your program's code segment is, the less space you have for heap. The more dynamic libraries you link into your program, the less space you get for the heap. This is important to remember.

How does the map for program A look when it can't allocate more memory blocks? With a trivial change to pause the program (see

阅读(1486) | 评论(0) | 转发(0) |

上一篇：Linux下C语言编程基础知识

下一篇：Linux下使用C/C++访问数据库——SQL Server篇

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6