内存分配失败问题-dixiaobing-ChinaUnix博客

dixiaobing

首页　| 　博文目录　| 　关于我

dixiaobing

博客访问： 1420294
博文数量： 860
博客积分： 425
博客等级：下士
技术积分： 1464
用户组：普通用户
注册时间： 2011-08-20 19:57

个人简介

对技术执着

文章分类

全部博文（860）

BLE（26）
linux技术（0）
外汇（0）
GPS（2）
音频（81）
杂项（131）
mcp（33）
linux（164）
ARM（43）
MTK（18）
看门狗（2）
kernel pani（20）
display（21）
v4l2（32）
pci（5）
hdmi（4）
mipi（3）
wifi（13）
bootloader（91）
dma（4）
设备模型（8）
uart（7）
网卡（17）
tty（2）
spi（14）
mtd（3）
input（10）
i2c（22）
usb（59）
SD（7）
bluetooth（18）
未分配的博文（0）

文章存档

2019年（16）

2018年（12）

2015年（732）

2013年（85）

2012年（15）

我的朋友

yandongx

相关博文

内存分配失败问题

分类： LINUX

2015-03-02 18:08:25

原文地址：内存分配失败问题作者：humjb_1983

messages中见到类似如下的内存分配失败的打印，但实际上系统的空闲内存还是够的：

点击(此处)折叠或打开

free:202655 *4k > 800M

而且，伙伴系统的需要分配大小的(order 3，即32k)块也是有的。

点击(此处)折叠或打开

Mar 1 14:52:30 localhost kernel: <programe name>: page allocation failure. order:3, mode:0x20
....
Mar 1 14:52:30 localhost kernel: [<ffffffff81129cae>] ? __alloc_pages_nodemask+0x7ce/0x990
Mar 1 14:52:30 localhost kernel: [<ffffffff811646a2>] ? kmem_getpages+0x62/0x170
Mar 1 14:52:30 localhost kernel: [<ffffffff8116524e>] ? fallback_alloc+0x21e/0x230
Mar 1 14:52:30 localhost kernel: [<ffffffff81164f59>] ? ____cache_alloc_node+0x99/0x170
Mar 1 14:52:30 localhost kernel: [<ffffffff8142c93a>] ? __alloc_skb+0x7a/0x180
Mar 1 14:52:30 localhost kernel: [<ffffffff81165eaf>] ? kmem_cache_alloc_node_notrace+0x6f/0x130
Mar 1 14:52:30 localhost kernel: [<ffffffff811660eb>] ? __kmalloc_node+0x7b/0x100
Mar 1 14:52:30 localhost kernel: [<ffffffff8142c93a>] ? __alloc_skb+0x7a/0x18
....
Mar 1 14:51:30 localhost kernel: active_anon:4400563 inactive_anon:1842083 isolated_anon:0
Mar 1 14:51:30 localhost kernel: active_file:616533 inactive_file:12564236 isolated_file:0
Mar 1 14:51:30 localhost kernel: unevictable:4059347 dirty:597 writeback:0 unstable:0
Mar 1 14:51:30 localhost kernel: free:202655 slab_reclaimable:146247 slab_unreclaimable:454915
....
Mar 1 14:51:30 localhost kernel: Node 0 Normal free:190996kB min:95488kB low:119360kB high:143232kB active_anon:8517676kB inactive_anon:3493456kB active_file:1048064kB inactive_file:22949524kB unevictable:8590824kB isolated(anon):0kB isolated(file):0kB present:46282240kB mlocked:1150664kB dirty:832kB writeback:0kB mapped:907972kB shmem:4338548kB slab_reclaimable:225724kB slab_unreclaimable:899148kB kernel_stack:16752kB pagetables:50592kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
....
Mar 1 14:51:30 localhost kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 3*4096kB = 15240kB
Mar 1 14:51:30 localhost kernel: Node 0 DMA32: 15833*4kB 10008*8kB 3164*16kB 2*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 194084kB
Mar 1 14:51:30 localhost kernel: Node 0 Normal: 40558*4kB 1403*8kB 1198*16kB 5*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 192784kB
Mar 1 14:51:30 localhost kernel: Node 1 Normal: 66667*4kB 13617*8kB 2265*16kB 22*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 412612k

打印相关警告的代码：

点击(此处)折叠或打开

__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
nodemask_t *nodemask, struct zone *preferred_zone,
int migratetype)
{
const gfp_t wait = gfp_mask & __GFP_WAIT;
...
nopage:
if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
unsigned int filter = SHOW_MEM_FILTER_NODES;
/*
* This documents exceptions given to allocations in certain
* contexts that are allowed to allocate outside current's set
* of allowed nodes.
*/
if (!(gfp_mask & __GFP_NOMEMALLOC))
if (test_thread_flag(TIF_MEMDIE) ||
(current->flags & (PF_MEMALLOC | PF_EXITING)))
filter &= ~SHOW_MEM_FILTER_NODES;
if (in_interrupt() || !wait)
filter &= ~SHOW_MEM_FILTER_NODES;
pr_warning("%s: page allocation failure. order:%d, mode:0x%x\n",
p->comm, order, gfp_mask);
dump_stack();
if (!should_suppress_show_mem())
show_mem(filter);
...

另外，系统中还有很多的cache，按理来说，当内存不足时，应该会进行cache回收才对，那为什么还会分配失败呢？
原因如下：
1、内存水线。系统中的水线为min和为100M+，这部分内存是内核保留的，通常是不分配的。
但该环境中的free内存>800M，还没达到水线，为什么？还要看原因2
2、在内存分配，计算空闲内存是否达到水线时，需要扣除掉小于目标分配块大小的内存。具体算法代码如下：

点击(此处)折叠或打开

for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o; //扣掉小于目标order的内存块
/* Require fewer higher order pages to be free */
min >>= 1; //扣掉后，需要除以2，目的是将水线分摊到各个order
if (free_pages <= min)
return 0;
}

这里要分配的是32k，需要扣除掉小于32k的空闲内存，扣除一次后就将水线(min)除以2，目的是将水线分摊到各个order，即最终需要>=32k的块的内存总和要大于水线/8，才能进行分配，而这个环境中扣掉如下块后，已经没有100M/8了，所以无法进一步分配。

点击(此处)折叠或打开

Mar 1 14:51:30 localhost kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB
Mar 1 14:51:30 localhost kernel: Node 0 DMA32: 15833*4kB 10008*8kB 3164*16kB
Mar 1 14:51:30 localhost kernel: Node 0 Normal: 40558*4kB 1403*8kB 1198*16kB
Mar 1 14:51:30 localhost kernel: Node 1 Normal: 66667*4kB 13617*8kB 2265*16kB

3、为什么cache不回收？因为这里的分配mode:0x20，表示GPF_ATOMIC，此时表示原子的分配方式，不能等待(通常用在中断等需要尽快分配内存的场景中)，这种方式下，内存不足时，会直接返回失败，不会进行cache回收。

点击(此处)折叠或打开

__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
nodemask_t *nodemask, struct zone *preferred_zone,
int migratetype)
{
const gfp_t wait = gfp_mask & __GFP_WAIT; //wait标记取决于__GFP_WAIT标记，GPF_ATOMIC时不带这个标记，GPF_KERNEL、GPF_USER中都带了。
...
page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
preferred_zone, migratetype);
if (page)
goto got_pg;
...
/* Atomic allocations - we can't balance anything */
if (!wait) //
goto nopage;

4、按理，使用示GPF_ATOMIC分配方式时，是应该能穿越水线的，为何这里还会失败？
确实能穿越水线一定比例，这个场景中，应该已经穿越了，但内存还是不够，但又没有其他机制触发cache回收，所以导致内存分配失败。
5、为什么没有出现OOM呢？还是因为GPF_ATOMIC标记，因为需要尽快处理，所以，这里不会进入OOM流程。相反，如果带了GPF_WAIT标记(GPF_KERNEL中带了)，则会进行内存(包括cache)回收，如果回收后仍不能满足分配要求，最后会进入OOM。

点击(此处)折叠或打开

/*
* If we failed to make any progress reclaiming, then we are
* running out of options and have to consider going OOM
*/
if (!did_some_progress) {
if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
if (oom_killer_disabled)
goto nopage;
page = __alloc_pages_may_oom(gfp_mask, order,
zonelist, high_zoneidx,
nodemask, preferred_zone,
migratetype);

规避方法
提高内存水线，以限制cache的使用量(新内核版本中已经有现成的/proc参数控制，但是老版本内核中没有，这里的情况是后者)，保证GPF_ATOMIC分配内存穿越水线时，有足够的内存。
/proc/sys/vm/min_free_kbytes

阅读(790) | 评论(0) | 转发(0) |

上一篇：内核异常分析方法

下一篇：一次用户态进程死循环案例的分析

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6