CFS调度中计算vruntime增量的一些细节-cainiao413-ChinaUnix博客

cainiao413的ChinaUnix博客

首页　| 　博文目录　| 　关于我

cainiao413

博客访问： 1347403
博文数量： 175
博客积分： 2743
博客等级：少校
技术积分： 4024
用户组：普通用户
注册时间： 2010-12-30 01:41

文章分类

全部博文（175）

汇编语言（0）
文件系统（1）
linux命令（1）
达人blog（0）
数据结构（2）
linux debug（2）
内存管理（5）
中断异常（3）
ARM体系架构（2）
android（2）
android-app（6）

java语法（0）
linux常用函数（6）
qualcomm-amss（2）
进程线程（50）

同步机制（1）

进程创建（5）

进程间通信（3）

socket（3）

uevent（1）

调度（30）
JAVA（0）
linux驱动（57）

驱动框架（1）

uart（8）

SPI（1）

build（1）

BT（8）

touch_screen（1）

RPC（1）

power_management（4）

ofn（11）

IIC（2）

linux命令（1）

debug 方法（1）

Audio（0）

RTC（2）

ppp协议（2）

filesystem（8）

RIL（0）

PCI（1）

USB（1）
program&intervie（31）
未分配的博文（5）

文章存档

2015年（1）

2013年（53）

2012年（71）

2011年（50）

我的朋友

相关博文

CFS调度中计算vruntime增量的一些细节

分类： LINUX

2013-04-22 14:33:22

CFS调度器选择vruntime最小的任务来进行调度，在更新vruntime的时候一个是要考虑当前时间和上次更新vruntime的时间差，二是要加权计算sched_entity的load。load的值依赖于当前sched_entity的nice值，在内核中这个nice和load的对应关系事先已经被计算好。

static const int prio_to_weight[40] = {
/* -20 */ 88761, 71755, 56483, 46273, 36291,
/* -15 */ 29154, 23254, 18705, 14949, 11916,
/* -10 */ 9548, 7620, 6100, 4904, 3906,
/* -5 */ 3121, 2501, 1991, 1586, 1277,
/* 0 */ 1024, 820, 655, 526, 423,
/* 5 */ 335, 272, 215, 172, 137,
/* 10 */ 110, 87, 70, 56, 45,
/* 15 */ 36, 29, 23, 18, 15,
};

这些值的取值原因自己也没有搞清楚，看了注释大概明白。

/*
* Nice levels are multiplicative, with a gentle 10% change for every
* nice level changed. I.e. when a CPU-bound task goes from nice 0 to
* nice 1, it will get ~10% less CPU time than another CPU-bound task
* that remained on nice 0.
*
* The "10% effect" is relative and cumulative: from _any_ nice level,
* if you go up 1 level, it's -10% CPU usage, if you go down 1 level
* it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
* If a task goes up by ~10% and another task goes down by ~10% then
the relative distance between them is ~25%.)

大概的意思就是nice值的上升1级(越小执行优先级越高)，也就意味着sched_entity的运行时间会减少。规定这个减少的幅度是10%，这个差距导致了两个sched_entityl的oad大概差了1.25倍，例子：两个sched_entity具有相同的nice，也就是各占50%的cpu时间，其中一个nice上升，导致了这个sched_entity cpu时间比之前少了10%，也就是一个sched_entity是55%，一个是45%，所以这个时候55和45大概差了1.25倍。

这样就有了 prio_to_weight数组。

还有数组，每个元素的值是1>>32/prio_to_weight每个元素的值。

/*
* Nice levels are multiplicative, with a gentle 10% change for every
* nice level changed. I.e. when a CPU-bound task goes from nice 0 to
* nice 1, it will get ~10% less CPU time than another CPU-bound task
* that remained on nice 0.
*
* The "10% effect" is relative and cumulative: from _any_ nice level,
* if you go up 1 level, it's -10% CPU usage, if you go down 1 level
* it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
* If a task goes up by ~10% and another task goes down by ~10% then
the relative distance between them is ~25%.)

这个数组的存在的意义很大，因为在更新vruntime的时候，时间差需要加权load，当nice=0表示不需要加权load计算。如果nice！=0那么需要按照当前的load和nice=0的load（nice=0的load=1024）的比例进行等比例缩减或者增加vruntime。

公式如下：

delta vruntime= deltaTime*1024/load （1024是nice=0的时候）

这里delta vruntime是vruntime的差值。deltaTime就是当前时间和上次计算的load的时间差，load就是当前sched_entity的load值，通过上面的公式可以看出，如果load越大，vruntime越小，cfs会通过红黑树选择出最需要调度的sched_entity，也就是vruntime最小的。

但是内核中会频繁统计这个delta vruntime，我们知道运算中，除法的运行周期十分长，这样就导致了计算delta vruntime会很慢。那么这个除法可以用乘法和位移运算来代替。

可以看出公式中，利用到了inv_load数组，这个数组事先缓存好了的值，然后最后结果在右移32位就是结果。

代码中，我想了很久才明白的一行代码：

/*
* Shift right and round:
*/
#define SRR(x, y) (((x) + (1UL << ((y) - 1))) >> (y))

注释的意思是，x右移y位，相当于然后结果，四舍五入。

可以知道1UL<<(y-1)相当于1<的一半，对于一个位移操作，相当于除法取整的操作。话语很难表述清楚这个x+1UL << ((y) – 1))的作用，举例说明一下：

SRR (10,2) 也就是10/4， 1010+0010=1100 >>2=0011=3

SRR (9,2) 就是9/4，1001+0010=1010>>2=0010=2

看了上面的例子也就清楚了，1UL<<(y-1)作用于y-1位（右边第一位是0位），如果为y-1位为1，就是产生了5入的情况，需要1UL<<(y-1)来帮助它进位，不被位移操作抵销掉。如果y-1位为0，那么也就是产生了4舍的情况，1UL<<(y-1)对它来说都是没有任何作用的，因为不会产生进位操作。

所以这个(1UL << ((y) – 1))是SRR操作能四舍五入的根本原因。

看下代码：

static unsigned long calc_delta_mine(unsigned long delta_exec, unsigned long weight,struct load_weight *lw){
u64 tmp;
if (!lw->inv_weight) { //如果inv_load为0，设置一个最小的inv_load为1，因为weight超过了1>>32,那么按原有逻辑得到的inv_load肯定是0，后面的乘法没有意义了。
if (BITS_PER_LONG > 32 && unlikely(lw->weight >= WMULT_CONST))
lw->inv_weight = 1;
else
lw->inv_weight = 1 + (WMULT_CONST-lw->weight/2) / (lw->weight+1);//这个逻辑不太清楚
}
tmp = (u64)delta_exec * weight;
/*
* Check whether we'd overflow the 64-bit multiplication:
*/
if (unlikely(tmp > WMULT_CONST)) //如果tmp结果大于了32位整数的限制，那么在乘以inv_load会产生overflow，那么处理逻辑可以先位移一半16位，再位移后一半16位
tmp = SRR(SRR(tmp, WMULT_SHIFT/2) * lw->inv_weight,WMULT_SHIFT/2);
else
tmp = SRR(tmp * lw->inv_weight, WMULT_SHIFT);
return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX);

不明白的问题：在sched_entity更新weight的时候（update_load_add，update_load_sub）会把inv_weight置为0，导致在计算delta_vruntime的时候重新计算inv_weight,公式中的

lw->inv_weight = 1 + (WMULT_CONST-lw->weight/2)/ (lw->weight+1) 是我不理解的，尤其是那个 lw->weight/2，有什么特殊的用意。希望看懂这行代码的同学能指点我一下。

阅读(1778) | 评论(0) | 转发(0) |

上一篇：Linux 时间片调度

下一篇：linux 进程调度器框架

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6