内核cfs_rq中的last buddy和next buddy-cengku-ChinaUnix博客

倚楼听风雨cengku.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

cengku

博客访问： 1283305
博文数量： 185
博客积分： 495
博客等级：下士
技术积分： 1418
用户组：普通用户
注册时间： 2012-09-02 15:12

个人简介

治肾虚不含糖，专注内核性能优化二十年。 https://github.com/KnightKu

文章分类

全部博文（185）

mpi（1）
ZFS（10）
flash-ssd-nvme（22）
随笔&感悟（11）
操作系统技术（28）
数据结构和算法（0）
Python实践（11）
C语言高级（11）
内核基础技术（63）
未分配的博文（28）

文章存档

2019年（1）

2018年（12）

2017年（5）

2016年（23）

2015年（1）

2014年（22）

2013年（82）

2012年（39）

我的朋友

相关博文

内核cfs_rq中的last buddy和next buddy

分类： C/C++

2013-03-28 11:57:32

原文地址：内核cfs_rq中的last buddy和next buddy 作者：djjsindy

在sched_features.h中定义了一些调度的特性

/*
* Prefer to schedule the task we woke last (assuming it failed
* wakeup-preemption), since its likely going to consume data we
* touched, increases cache locality.
*/
SCHED_FEAT(NEXT_BUDDY, 0)
/*
* Prefer to schedule the task that ran last (when we did
* wake-preempt) as that likely will touch the same data, increases
* cache locality.
*/
SCHED_FEAT(LAST_BUDDY, 1)

从注释中可以看出 NEXT_BUDDY表示在cfs选择next sched_entity的时候会优先选择最后一个唤醒的sched_entity，而 LAST_BUDDY表示在cfs选择next sched_entity的时候会优先选择最后一个执行唤醒操作的那个sched_entity，这两种调度策略都有助于提高cpu cache的命中率，因为在切换不同任务得越频繁，会导致cpu cache因为进程改变而频繁缓存数据失效。所以尽量不切换到其他的任务会使得cache实效的更慢一些，有利于提高系统性能，可以看出默认NEXT_BUDDY是不开启的。

内核代码中next buddy 和last buddy对应cfs_rq 中的next和last指针。

看下代码这个指针的赋值，check_preempt_wakeup函数在wakeup操作的最后一部，wakeup包括了fork的最后一部也是调用了这个函数。

static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
struct task_struct *curr = rq->curr;
struct sched_entity *se = &curr->se, *pse = &p->se; //pse就是需要被唤醒的sched_entity,se是当前的sched_entity
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int scale = cfs_rq->nr_running >= sched_nr_latency; //内核中有选择，当running的进程大于等于 sched_nr_latency的时候才会涉及到next buddy和last buddy，2.6.35.13中sched_nr_latency默认是3
if (unlikely(rt_prio(p->prio)))
goto preempt;
if (unlikely(p->sched_class != &fair_sched_class))
return;
if (unlikely(se == pse))
return;
对于这个WF_FORK标志可以看下 https://patchwork.kernel.org/patch/47930/
if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK))
set_next_buddy(pse); //如果running 大于等于sched_nr_latency，就设置next指针，表示最后被唤醒的sched_entity
/*
* We can come here with TIF_NEED_RESCHED already set from new task
* wake up path.
*/
if (test_tsk_need_resched(curr))
return;
/*
* Batch and idle tasks do not preempt (their preemption is driven by
* the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL))
return;
/* Idle tasks are by definition preempted by everybody. */
if (unlikely(curr->policy == SCHED_IDLE))
goto preempt;
if (!sched_feat(WAKEUP_PREEMPT))
return;
update_curr(cfs_rq);
find_matching_se(&se, &pse);
BUG_ON(!pse);
if (wakeup_preempt_entity(se, pse) == 1)
goto preempt;
return;
preempt: //可以抢占，标记resched
resched_task(curr);
/*
* Only set the backward buddy when the current task is still
* on the rq. This can happen when a wakeup gets interleaved
* with schedule on the ->pre_schedule() or idle_balance()
* point, either of which can * drop the rq lock.
*
* Also, during early boot the idle thread is in the fair class,
* for obvious reasons its a bad idea to schedule back to it.
*/
if (unlikely(!se->on_rq || curr == rq->idle))
return;
if (sched_feat(LAST_BUDDY) && scale && entity_is_task(se))//当前进程标记last buddy，设置cfs_rq的last指针
set_last_buddy(se);
}

从代码中可以看出，在wakeup进程的时候会去check是否当前的curr需要resched，这时会设置cfs_rq的last和next指针，last表示最后一个调用唤醒操作的进程，next表示最后一个被唤醒的进程。这两个指针会在pick next sched_entity的时候被用到，优先选择这些sched_entity。

看下选择next sched_entity时候的代码

static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
struct sched_entity *se = __pick_next_entity(cfs_rq); //这里确实是选择vruntime最小的那个sched_entity
struct sched_entity *left = se;
if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1) //判断left和next的vruntime的差距是否小于sysctl_sched_wakeup_granularity
se = cfs_rq->next;
/*
* Prefer last buddy, try to return the CPU to a preempted task.
*/
if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1) //判断left和last的vruntime的差距是否小于sysctl_sched_wakeup_granularity
se = cfs_rq->last;
clear_buddies(cfs_rq, se); //用过一次任何一个next或者last，都需要清除掉这个指针，以免影响到下次pick next sched_entity
return se;
}

这里wakeup_preempt_entity函数，fork进程在cfs中的处理过程中的最后说的很清楚了，把那个图贴过来说明一下问题

可以看到如果S3是left，curr是next或者last，可以函数wakeup_preempt_entity肯定返回1，那么就说明next和last指针的vruntime和left差距过大，这个时候没有必要选择这个last或者next指针，如果next或者last是S2，S1，那么vruntime和left差距并不大，并没有超过sysctl_sched_wakeup_granularity ，那么这个next或者last就可以被优先选择，而代替了left。

而清除这两个指针的时机有这么几个：

sched_tick的时候，如果一个进程的运行时间超过理论时间（这个时间是根据load和cfs_rq的load，平均分割sysctl_sched_latency的时间），那么如果next或者last指针指向这个正在运行的进程，需要清除这个指针，使得pick sched_entity不会因为next或者last指针再次选择到这个sched_entity。
当一个sched_entity dequeue出运行队列，那么如果有next或者last指针指向这个sched_entity，那么需要删除这个next或者last指针。
刚才说的那种case，如果next，last指针在pick的时候被使用了一次，那么这次用完了指针，需要清除相应的指针，避免使用过的next，last指针影响到下次pick。
当进程yield操作的时候，进程主动放弃了调度机会，那么如果next，last指针指向了这个sched_entity，那么需要清除相应指针。

总结：

cfs_rq的last和next指针，last表示最后一个执行wakeup的sched_entity,next表示最后一个被wakeup的sched_entity。他们在进程wakeup的时候会赋值，在pick新sched_entity的时候，会优先选择这些last或者next指针的sched_entity,有利于提高缓存的命中率。

阅读(1238) | 评论(0) | 转发(0) |

上一篇：siizeof和内存对齐

下一篇：用profile和oprofile监视视linux性能！

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6