The analysis of I/O schedulers(2)-thewayma-ChinaUnix博客

thewayma

首页　| 　博文目录　| 　关于我

thewayma

博客访问： 732942
博文数量： 183
博客积分： 2650
博客等级：少校
技术积分： 1428
用户组：普通用户
注册时间： 2008-11-22 17:02

文章分类

全部博文（183）

QEMU（9）
Xen（0）
虚拟化（5）
KVM（32）
系统优化（2）
嵌入式（5）
GPU架构（2）
硬件（4）
I2C总线（2）
构建嵌入式文件系（0）
MIPS架构研究（8）
Linux内核线程（2）
Linux 基础编程（2）
PCI总线（5）
Linux性能（3）
nsfocus产品（0）
Linux TCP/IP协议（6）
Linux内核初始化（5）
Linux文件系统（15）
Linux内核攻击（0）
内核同步（4）
Linux IPC（1）
Linux进程管理（2）
网络攻击（5）

IP分片攻击（3）
Linux rootkit（3）
Linux中断（1）
Linux设备驱动（19）

块设备（14）
Linux 内核重要算（7）

I/O调度分析（2）
GUN C（3）
算法设计（2）
Linux内存管理（24）
未分配的博文（5）

文章存档

2017年（1）

2015年（46）

2014年（4）

2013年（8）

2012年（2）

2011年（27）

2010年（35）

2009年（60）

我的朋友

The analysis of I/O schedulers(2)

AS:

The full name of this I/O scheduler is anticipatory scheduling. In a simple word, when it is the time to select which the next request to be dispatched, the I/O scheduler will make a decision whether it is worth to delay to dispatch the next request which has been selected because another next request might arrive soon and this request might be more worth to be dispatched than the next request which has been selected currently. For example, suppose there are two processes creating I/O requests concurrently, at any moment, the disk head can just stay at one position, let's suppose at this time the I/O scheduler is serving process A. It is usual that a process at first reads some data from disk, then waits for a moment, then reads again. The time during the two read operation is called deceptive idleness. Why the author used the word “deceptive”is that this process is not in the real state of idleness, because very soon, the process will create I/O requests at once. Let's go back to our example, suppose the disk head is serving process A. But a moment later, there is no requests from the process A, at the same time, process B is creating requests. The I/O scheduler should make decision now. Should the disk head move to serve process B at once or should the disk head stay for a moment to wait for additional requests from A? It is obvious that if the process A will create requests at once, the disk head will avoid unnecessary movement between A and B. And the shrink of disk movement between A and B will be decreased.

Above is the basic background of anticipatory scheduling. In Linux kernel, AS is based on DEADLINE. In another word, the basic design idea of AS is CSCAN, with the component of anticipatory.

So, there are also two queues like DEADLINE in AS. One is sorted queue, the other is expired queue( I use deadline queue when I describe DEADLINE). The basic principle of AS is the same with DEADLINE, whenever the I/O scheduler selects the requests to be dispatched, it first checks whether the request on the expired queue has expired, if yes, then dispatch this request at once. If no, the request on the sorted queue will be selected, then I/O scheduler will run a routine called as_can_anticipate to decide whether anticipation should do.

Let's discuss it in detail.

First, the I/O scheduler should get the next request to dispatch. But how to get the next request? Which request in the sorted queue should be the next request? The routine as_find_next_rq will do this job. In a simple word, the heuristic to get the next request is below: the next request should be the nearest request to the current disk head. In DEADLINE, nearest means the next request of the sorted queue, in AS, it is a little more complicated, the I/O scheduler will compare the left and right(or back and front) requests of the current one, if the distance between left and the current is smaller than the half distance between the right and current, the left request will be selected. The design idea behind this is very simple, in the original CSCAN, the disk head will move in the same direction, but in AS, the disk head has the chance to move backward.

Once the I/O scheduler has selected the request to be dispatched from the sorted queue, it is still not the time to dispatch. Then, AS will look at the expired queue to check whether the time of the first request has expired. As I have said, if it is, the request of the expired queue will be selected and dispatched at once. Here, let's suppose the time is not expired. Since the request has been selected, AS now will check whether the anticipation should be done. The routine as_can_anticipate will do this job. Thought I do not understand all of these conditions, I just list what I understand about them, maybe the best way is to follow the source code.

there are some other tricks in the routine of as_can_anticipate, such as if the last request is write, anticipation will not be done. Here I just analysis the key part of this routine, actually, as_can_anticipate invokes the routine as_can_break_anticipation, in another word, if as_can_break_anticipation returns 1, which means the anticipation will not be done.

static int as_can_break_anticipation(struct as_data *ad, struct request *rq)

{

struct io_context *ioc;

struct as_io_context *aic;

ioc = ad->io_context;

BUG_ON(!ioc);

spin_lock(&ioc->lock);

this is what I understand of AS, if the current request is from the same process, anticipation will not be done, this request should be dispatched into device driver at once.

if (rq && ioc == RQ_IOC(rq)) {

/* request from same process */

spin_unlock(&ioc->lock);

return 1;

}

each time when a request is dispatched into device driver, ad->ioc_finished will be set to 0, when the request has been finished, and if the latest dispatched request is from the same process with this request, this value will be set to 1. I think we can simply understand it as when ad->ioc_finished equals 1, it means the previous requests from the process has been finished. So, this condition can be understood as if the anticipation time the process has been expired, the anticipation will not be done.

if (ad->ioc_finished && as_antic_expired(ad)) {

* In this situation status should really be FINISHED,

* however the timer hasn't had the chance to run yet.

spin_unlock(&ioc->lock);

return 1;

}

I am not sure what's the meaning of these codes, why is aic NULL? Is it because of the first request from the process?

aic = ioc->aic;

if (!aic) {

spin_unlock(&ioc->lock);

return 0;

}

aic->nr_queued means more than one request have been queued in AS' queue. Here, if the request is synchronize, there is at most one request in the queue for a process. Also, aic->nr_dispatched means more than one request have been dispatched into device driver and they are not finished yet.

What the anticipation wants to do is to wait the next request, since there have been more than one from a process in the queue, it is unnecessary to do anticipation any more.

if (atomic_read(&aic->nr_queued) > 0) {

/* process has more requests queued */

spin_unlock(&ioc->lock);

return 1;

}

if (atomic_read(&aic->nr_dispatched) > 0) {

/* process has more requests dispatched */

spin_unlock(&ioc->lock);

return 1;

}

If the request is sync and it is near the last request, anticipation will not be done.

if (rq && rq_is_sync(rq) && as_close_req(ad, aic, rq)) {

* Found a close request that is not one of ours.

* This makes close requests from another process update

* our IO history. Is generally useful when there are

* two or more cooperating processes working in the same

* area.

Here the AS updates ad->exit_prob and ad->exit_no_coop, as its definition, ad->exit_prob means the probability a task will exit while being waited on. ad->exit_no_coop means probability an exited task will not be part of a later cooperating request. Though I am not quiet sure about the deep meaning of this two parameters, I can use a very simple sentence to describe, if the chance that the process has been died is very high, it is worthless to wait.

if (!test_bit(AS_TASK_RUNNING, &aic->state)) {

if (aic->ttime_samples == 0)

ad->exit_prob = (7*ad->exit_prob + 256)/8;

ad->exit_no_coop = (7*ad->exit_no_coop)/8;

}

as_update_iohist(ad, aic, rq);

spin_unlock(&ioc->lock);

return 1;

}

if (!test_bit(AS_TASK_RUNNING, &aic->state)) {

/* process anticipated on has exited */

if (aic->ttime_samples == 0)

ad->exit_prob = (7*ad->exit_prob + 256)/8;

if (ad->exit_no_coop > 128) {

spin_unlock(&ioc->lock);

return 1;

}

if (aic->ttime_samples == 0) {

if (ad->new_ttime_mean > ad->antic_expire) {

spin_unlock(&ioc->lock);

return 1;

}

if (ad->exit_prob * ad->exit_no_coop > 128*256) {

spin_unlock(&ioc->lock);

return 1;

}

} else if (aic->ttime_mean > ad->antic_expire) {

/* the process thinks too much between requests */

spin_unlock(&ioc->lock);

return 1;

}

spin_unlock(&ioc->lock);

return 0;

}

Even I list the source code above, I do not answer the question about when to do anticipation exactly. The first condition is thinktime. Thinktime means the time between two requests. If the previous request is being served in device driver while the next request has been decided to dispatch into device driver, thinktime would be 0. Please notice that the next request should not of course be the request from the same process, it might be near the current disk head while it belongs to another process. On the other hand, if the precious request has been served, the time difference between the previous request and the current request to be dispatched is the thinktime. AS use such mathematic formulations to calculate ttime_mean:

aic->ttime_samples = (7*aic->ttime_samples + 256) / 8;

aic->ttime_total = (7*aic->ttime_total + 256*ttime) / 8; // ttime is thinktime

aic->ttime_mean = (aic->ttime_total + 128) / aic->ttime_samples;

It is obvious to notice that if thinktime is large, ttime_mean will be large, if ttime_mean is larger than antic_expire which is the maximum time of anticipation, it means no anticipation should be done. Actually, you can understand it in this way, if the process is I/O insensitive, the interval time between requests from the process is very small, even 0(if the requests are sync, 0 is almost impossible), as a result, think_time will be small, it will always be smaller than antic_expire, which means it is worth to be waited.

The other element to decide whether it is worth to be waited is seek_mean. As we know, locality I/O accession is a default property of process, but this is not always true, especially for random I/O. So, the AS should always consider the distance between the two requests. If the distance of the two requests is small, it might be worth to wait for the next request when the previous request has been served. The mathematic formulation is below:

aic->seek_samples = (7*aic->seek_samples + 256) / 8;

aic->seek_total = (7*aic->seek_total + (u64)256*sdist) / 8;

total = aic->seek_total + (aic->seek_samples/2);

do_div(total, aic->seek_samples);

aic->seek_mean = (sector_t)total;

the mathematic model to calculate seek_mean and ttime_mean is the same. As you may notice I don't explain a routine as_close_req in the source codes above, actually, if the distance between two requests is smaller than seek_mean, it means these two requests are near to each other. Of course, there is a limitation for seek_mean, you can find it yourself in source code.

As considers the thinktime, seek distance the exit probability of this process to evaluate whether it is worth to be waited. I think these 3 elements, especially the first two is the key component of the algorithm AS. Each time when a new request is added to the queue, ttime_mean and seek_mean of the process is updated. At the same time, the AS will check whether the anticipation should be broken if there it is. In a simple word, whenever there is a new request, the AS should check whether this one is the right request being waited.

阅读(858) | 评论(0) | 转发(0) |

上一篇：The analysis of I/O schedulers(1)

下一篇：The analysis of I/O schedulers(3)

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6