alloc_page-printk1986-ChinaUnix博客

我还没输！phhong.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

printk1986

博客访问： 621258
博文数量： 113
博客积分： 2554
博客等级：少校
技术积分： 1428
用户组：普通用户
注册时间： 2011-12-21 19:53

文章分类

全部博文（113）

工作（1）
拓展（4）
C++（1）
学习感悟（3）
Linux设备驱动（2）
Linux内核（1）
Linux情景笔记（1）
vim使用技巧（1）
python（2）
脚本编程（4）
ubuntu（1）
腾讯面试题（2）
centos（6）
gentoo（4）
算法（3）
一些应试题（19）
Linux系统管理（10）
网络知识（3）

转载（1）
Linux命令（4）
数据结构（5）
unix环境编程（15）
IPC（2）
进程管理（3）
中断，异常和系统（3）
内存管理（9）
未分配的博文（4）

文章存档

2014年（1）

2013年（2）

2012年（94）

2011年（16）

我的朋友

相关博文

alloc_page

分类： LINUX

2011-12-21 19:53:40

__alloc_pages处于内核内存管理的最底层,无论slab,vmallc,kmalloc,mmap,brk
还是page cache,buffer都要通过__alloc_pages获取最基本的物理内存pages.
linux执行这样一种内存管理策略:
a)充分利用物理内存,建立各种cache,优化程序性能,减少磁盘操作.这一点和win
dows系统不同,windows系统中总是有很多内存空闲,即便是进行了大量的磁盘操作后.
而linux中真正空闲的物理内存几乎就看不到.
b)保证有足够的潜在物理内存(页面),可以立即加以回收,或称潜在可分配页面.通
过内核的守护进程kswapd,bdflush,kreclaimd的定期处理,加上每次内存分配对系统
的调整,即通过__alloc_pages所遇到的各种内存分配压力,不断的调整守护进程的工
作方向,保证系统拥有足够的潜在可回收内存.
先看看对内存页面有些什么样的保有量要求:
1)可分配页面的保有量要求:inactive_clean+free pages(in buddy pages)
系统的期望值是freepages.high + inactive_target / 3,inactive_target就是
min((memory_pressure >> INACTIVE_SHIFT),num_physpages / 4)).可见期望的保
有量有动态的因素在内.
现在的保有量是nr_free_pages() + nr_inactive_clean_pages();
mm/vmscan.c中的函数free_shortage,计算期望的可分配页面和现实之差距.如果
保有量合格,但看zone中的inbuddy free pages是比期望值少.只要有一个保有量不
合格,就必须立即加以调整.free_shortage请自己阅读.
2)潜在可分配页面的保有量要求:(buddyfree+inactiveclean+inactive_dirty)
期望保有量:freepages.high+inactive_target
现存量:
nr_free_pages()+nr_inactive_clean_pages()+nr_inactive_dirty_pages.
所做分析已注入代码:
/*
* 基于区的buddy 系统的核心策略
* This is the 'heart' of the zoned buddy allocator:
*/
struct page * __alloc_pages(zonelist_t *zonelist, unsigned longorder)
{
zone_t **zone;
int direct_reclaim = 0;
unsigned int gfp_mask = zonelist->gfp_mask;
struct page * page;
/*
* Allocations put pressure on the VM subsystem.
*/
memory_pressure++;
/*
* (If anyone calls gfp from interrupts nonatomically then it
* will sooner or later tripped up by a schedule().)
*
* We are falling back to lower-level zones if allocation
* in a higher zone fails.
*/
/*
* Can we take pages directly from the inactive_clean
* list?
*/
/* PF_MEMALLOC 代表是为管理目的而请求分配pages */
if (order == 0 && (gfp_mask & __GFP_WAIT) &&
!(current->flags & PF_MEMALLOC))
direct_reclaim = 1;
/*
* If we are about to get low on free pages and we also have
* an inactive page shortage, wake up kswapd.
*/
if (inactive_shortage() > inactive_target / 2 &&free_shortage())
wakeup_kswapd(0);/*用各种办法保持潜在可分配页面的数量*/
/*
* If we are about to get low on free pages and cleaning
* the inactive_dirty pages would fix the situation,
* wake up bdflush.
*/
else if (free_shortage() && nr_inactive_dirty_pages >free_shortage()
&& nr_inactive_dirty_pages >= freepages.high)
wakeup_bdflush(0);/*加速将buffer中的数据写入磁盘的过程*/
try_again:
/*
* 首先,选取那些拥有许多的空闲内存的zone
* We allocate free memory first because it doesn't contain
* any data ...
*/
/* 这轮分配只看绝对空闲页的水位*/
zone = zonelist->zones;
for (;;) {
zone_t *z = *(zone++);
if (!z)
break;
if (!z->size)
BUG();
if (z->free_pages >= z->pages_low) {//空闲页面保有量合格
page = rmqueue(z, order);
if (page)
return page;
} else if (z->free_pages < z->pages_min &&
waitqueue_active(&kreclaimd_wait)) {
wake_up_interruptible(&kreclaimd_wait);
/* kreclaimd:从zone_t->inactive_clean_list 队列中回收页面 */
}
}
/* If there is a lot of activity, inactive_target
* will be high and we'll have a good chance of
* finding a page using the HIGH limit.
*/
/*既然找不到空闲页面较多的zone,就找inactive_clean页面很
*丰富的zone试试
*/
page = __alloc_pages_limit(zonelist, order, PAGES_HIGH,direct_reclaim);
if (page)
return page;
/*
* 还不行就找inactive_clean页面还行的zone
* zone->pages_low < free + inactive_clean
* When the working set is very large and VM activity
* is low, we're most likely to have our allocation
* succeed here.
*/
page = __alloc_pages_limit(zonelist, order, PAGES_LOW,direct_reclaim);
if (page)
return page;
/*
* 没有zone 的空闲页面(buddy+inactive clean)能够满足需求了
*
* We wake up kswapd, in the hope that kswapd will
* resolve this situation before memory gets tight.
*
* We also yield the CPU, because that:
* - gives kswapd a chance to do something
* - slows down allocations, in particular the
* allocations from the fast allocator that's
* causing the problems ...
* - ... which minimises the impact the "bad guys"
* have on the rest of the system
* - if we don't have __GFP_IO set, kswapd may be
* able to free some memory we can't free ourselves
*/
wakeup_kswapd(0); /* 参数0, 代表不睡眠*/
/* kswapd -->致力于保持潜在可分配页面的保有量*/
if (gfp_mask & __GFP_WAIT) {
__set_current_state(TASK_RUNNING);
current->policy |= SCHED_YIELD;
schedule();
}
/*
* After waking up kswapd, we try to allocate a page
* from any zone which isn't critical yet.
*
* 也许我们不能等Kswapd 完成他的工作
* 先以更低的水位要求试试
*/
page = __alloc_pages_limit(zonelist, order, PAGES_MIN,direct_reclaim);
if (page)
return page;
/*
* Damn, we didn't succeed.
*
*/
/* 对于普通进程还有情况我们可以考虑到*/
if (!(current->flags & PF_MEMALLOC)) {
if (order > 0 && (gfp_mask & __GFP_WAIT)) {
/* 我们在处理 higher order 的分配,并且可以等待 */
zone = zonelist->zones;
/*将dirty页面写入磁盘*/
current->flags |= PF_MEMALLOC; //page_launder也可能分配页面
page_launder(gfp_mask, 1);//这个进程作为调用环境,提升其
current->flags &= ~PF_MEMALLOC;color=blue>//优先级避免递归运行到这里
for (;;) {
zone_t *z = *(zone++);
if (!z)
break;
if (!z->size)
continue;
while (z->inactive_clean_pages) {
/*补充空闲页面到buddy*/
struct page * page;
/* Move one page to the free list. */
page = reclaim_page(z);
if (!page)
break;
__free_page(page); //释放到buddy
/*也许就有连续页面了*/
/* Try if the allocation succeeds. */
page = rmqueue(z, order); //再试试high_order的分配
if (page)
return page;
}
}
}
/*
* We have to do this because something else might eat
* the memory kswapd frees for us and we need to be
* reliable.
*/
if ((gfp_mask & (__GFP_WAIT|__GFP_IO)) ==(__GFP_WAIT|__GFP_IO)) {
/* 如果容许io操作,并可以等待,唤醒kswapd
* 并等待kswapd 恢复内存的平衡状态
*/
wakeup_kswapd(1); /* 参数1, 代表可以阻塞*/
memory_pressure++;
if (!order) //* 主意:我们在higher order 时不'again',
// 因为,可能kswapd 永远( *ever* )不能为我们
// 释放出一个大的连续区域.
goto try_again;
/*
* If __GFP_IO isn't set, we can't wait on kswapd because
* kswapd just might need some IO locks /we/ are holding ...
*
* SUBTLE: The scheduling point above makes sure that
* kswapd does get the chance to free memory we can't
* free ourselves...
*/
} else if (gfp_mask & __GFP_WAIT) {
//不能进行io的情况下代替kswapd做些
//不进行io 努力
try_to_free_pages(gfp_mask);
memory_pressure++;
if (!order)
goto try_again;
}
}
/*
* Final phase: allocate anything we
*
* Higher order allocations, GFP_ATOMIC allocations and
* recursive allocations (PF_MEMALLOC) end up here.
*
* Only recursive allocations can use the very last pages
* in the system, otherwise it would be just too easy to
* deadlock the system...
*/
zone = zonelist->zones;
for (;;) {
zone_t *z = *(zone++);
struct page * page = NULL;
if (!z)
break;
if (!z->size)
BUG();
/*
* SUBTLE: direct_reclaim is only possible if the task
* becomes PF_MEMALLOC while looping above. This will
* happen when the OOM killer selects this task for
* instant execution...(看英文吧)
*/
if (direct_reclaim) {
page = reclaim_page(z);
if (page)
return page;
}
/* XXX: is pages_min/4 a good amount to reserve for this? */
if (z->free_pages < z->pages_min / 4 &&
!(current->flags & PF_MEMALLOC))
continue;
page = rmqueue(z, order);
if (page)
return page;
}
/* No luck.. */
printk(KERN_ERR "__alloc_pages: %lu-order allocation failed.\n", order);
return NULL;
}
与内存分配有关的函数还有:
unsigned long get_zeroed_page(int gfp_mask)
void __free_pages(struct page *page, unsigned long order)
void free_pages(unsigned long addr, unsigned long order)
另外还有几个用于统计内存压力的函数:
unsigned int nr_free_pages (void)
unsigned int nr_inactive_clean_pages (void)
unsigned int nr_free_buffer_pages (void)
unsigned int nr_free_highpages (void)
这些函数较为简单,不再分析.

阅读(3731) | 评论(0) | 转发(0) |

上一篇：没有了

下一篇：expand函数

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6