内核cfs_rq中的last buddy和next buddy-djjsindy-ChinaUnix博客

djjsindydjjsindy.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

djjsindy

博客访问： 719390
博文数量： 31
博客积分： 330
博客等级：一等列兵
技术积分： 3004
用户组：普通用户
注册时间： 2012-09-05 22:38

个人简介

java开发工程师，专注于内核源码，算法，数据结构。 qq：630501400

文章分类

全部博文（31）

文章存档

2014年（2）

2013年（22）

2012年（7）

我的朋友

相关博文

内核cfs_rq中的last buddy和next buddy

分类： C/C++

2013-03-27 12:48:14

在sched_features.h中定义了一些调度的特性

/*
* Prefer to schedule the task we woke last (assuming it failed
* wakeup-preemption), since its likely going to consume data we
* touched, increases cache locality.
*/
SCHED_FEAT(NEXT_BUDDY, 0)
/*
* Prefer to schedule the task that ran last (when we did
* wake-preempt) as that likely will touch the same data, increases
* cache locality.
*/
SCHED_FEAT(LAST_BUDDY, 1)

从注释中可以看出 NEXT_BUDDY表示在cfs选择next sched_entity的时候会优先选择最后一个唤醒的sched_entity，而 LAST_BUDDY表示在cfs选择next sched_entity的时候会优先选择最后一个执行唤醒操作的那个sched_entity，这两种调度策略都有助于提高cpu cache的命中率，因为在切换不同任务得越频繁，会导致cpu cache因为进程改变而频繁缓存数据失效。所以尽量不切换到其他的任务会使得cache实效的更慢一些，有利于提高系统性能，可以看出默认NEXT_BUDDY是不开启的。

内核代码中next buddy 和last buddy对应cfs_rq 中的next和last指针。

看下代码这个指针的赋值，check_preempt_wakeup函数在wakeup操作的最后一部，wakeup包括了fork的最后一部也是调用了这个函数。

static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
struct task_struct *curr = rq->curr;
struct sched_entity *se = &curr->se, *pse = &p->se; //pse就是需要被唤醒的sched_entity,se是当前的sched_entity
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int scale = cfs_rq->nr_running >= sched_nr_latency; //内核中有选择，当running的进程大于等于 sched_nr_latency的时候才会涉及到next buddy和last buddy，2.6.35.13中sched_nr_latency默认是3
if (unlikely(rt_prio(p->prio)))
goto preempt;
if (unlikely(p->sched_class != &fair_sched_class))
return;
if (unlikely(se == pse))
return;
对于这个WF_FORK标志可以看下 https://patchwork.kernel.org/patch/47930/
if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK))
set_next_buddy(pse); //如果running 大于等于sched_nr_latency，就设置next指针，表示最后被唤醒的sched_entity
/*
* We can come here with TIF_NEED_RESCHED already set from new task
* wake up path.
*/
if (test_tsk_need_resched(curr))
return;
/*
* Batch and idle tasks do not preempt (their preemption is driven by
* the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL))
return;
/* Idle tasks are by definition preempted by everybody. */
if (unlikely(curr->policy == SCHED_IDLE))
goto preempt;
if (!sched_feat(WAKEUP_PREEMPT))
return;
update_curr(cfs_rq);
find_matching_se(&se, &pse);
BUG_ON(!pse);
if (wakeup_preempt_entity(se, pse) == 1)
goto preempt;
return;
preempt: //可以抢占，标记resched
resched_task(curr);
/*
* Only set the backward buddy when the current task is still
* on the rq. This can happen when a wakeup gets interleaved
* with schedule on the ->pre_schedule() or idle_balance()
* point, either of which can * drop the rq lock.
*
* Also, during early boot the idle thread is in the fair class,
* for obvious reasons its a bad idea to schedule back to it.
*/
if (unlikely(!se->on_rq || curr == rq->idle))
return;
if (sched_feat(LAST_BUDDY) && scale && entity_is_task(se))//当前进程标记last buddy，设置cfs_rq的last指针
set_last_buddy(se);
}

从代码中可以看出，在wakeup进程的时候会去check是否当前的curr需要resched，这时会设置cfs_rq的last和next指针，last表示最后一个调用唤醒操作的进程，next表示最后一个被唤醒的进程。这两个指针会在pick next sched_entity的时候被用到，优先选择这些sched_entity。

看下选择next sched_entity时候的代码

static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
struct sched_entity *se = __pick_next_entity(cfs_rq); //这里确实是选择vruntime最小的那个sched_entity
struct sched_entity *left = se;
if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1) //判断left和next的vruntime的差距是否小于sysctl_sched_wakeup_granularity
se = cfs_rq->next;
/*
* Prefer last buddy, try to return the CPU to a preempted task.
*/
if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1) //判断left和last的vruntime的差距是否小于sysctl_sched_wakeup_granularity
se = cfs_rq->last;
clear_buddies(cfs_rq, se); //用过一次任何一个next或者last，都需要清除掉这个指针，以免影响到下次pick next sched_entity
return se;
}

这里wakeup_preempt_entity函数，fork进程在cfs中的处理过程中的最后说的很清楚了，把那个图贴过来说明一下问题

可以看到如果S3是left，curr是next或者last，可以函数wakeup_preempt_entity肯定返回1，那么就说明next和last指针的vruntime和left差距过大，这个时候没有必要选择这个last或者next指针，如果next或者last是S2，S1，那么vruntime和left差距并不大，并没有超过sysctl_sched_wakeup_granularity ，那么这个next或者last就可以被优先选择，而代替了left。

而清除这两个指针的时机有这么几个：

sched_tick的时候，如果一个进程的运行时间超过理论时间（这个时间是根据load和cfs_rq的load，平均分割sysctl_sched_latency的时间），那么如果next或者last指针指向这个正在运行的进程，需要清除这个指针，使得pick sched_entity不会因为next或者last指针再次选择到这个sched_entity。
当一个sched_entity dequeue出运行队列，那么如果有next或者last指针指向这个sched_entity，那么需要删除这个next或者last指针。
刚才说的那种case，如果next，last指针在pick的时候被使用了一次，那么这次用完了指针，需要清除相应的指针，避免使用过的next，last指针影响到下次pick。
当进程yield操作的时候，进程主动放弃了调度机会，那么如果next，last指针指向了这个sched_entity，那么需要清除相应指针。

总结：

cfs_rq的last和next指针，last表示最后一个执行wakeup的sched_entity,next表示最后一个被wakeup的sched_entity。他们在进程wakeup的时候会赋值，在pick新sched_entity的时候，会优先选择这些last或者next指针的sched_entity,有利于提高缓存的命中率。

阅读(7094) | 评论(0) | 转发(4) |

上一篇：fork进程在CFS的处理过程

下一篇：CFS中一些调度参数的实现原理

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6