全部博文(160)
分类: LINUX
2013-07-31 20:37:20
The analysis of I/O schedulers(1)
I/O schedulers is the key part of I/O system in OS. The task of the I/O schedulers is about how to dispatch the I/O requests into device driver efficiently. In current Linux version, there are 4 types of I/O schedulers, they are NOOP, AS, DEADLINE, and CFQ. The analysis is based on the version of 2.6.28.9
You can get the concept from Understanding the Linux Kernel v2 easily, the difficult thing is not about the concept, but about how to start, from which point to understand the procedure of I/O schedulers. Let's start with the routine of __make_request in block/blk-core.c
We can see the definition of this function:
static int __make_request(struct request_queue *q, struct bio *bio)
Every block device has a request queue, for example, /dev/sda has a queue, /dev/sdb has a queue, /dev/md0 has a queue too. Here md0 is the block device of RAID. Bio presents for the I/O request, you can consider it as a data structure to describe I/O requests.
Let's see the main structure of __make_request:
static int __make_request(struct request_queue *q, struct bio *bio) {
….
el_ret = elv_merge(q, &req, bio);
switch (el_ret) {
case ELEVATOR_BACK_MERGE:
…
goto out;
case ELEVATOR_FRONT_MERGE:
…
goto out;
default:
;
}
….
out:
if (sync)
__generic_unplug_device(q);
….
}
elv is the short for the word of elevator, here the concept of elevator might be different from the concept in the book. As I remember in some books, the elevator plays the role of CSCAN, but in Linux kernel, it doesn't. What __make_request has done is that the elevator first merge the adjacent requests, then tell the I/O scheduler to dispatch the requests to device driver.
As I have said, every block device has a queue, in Linux kernel, the data structure is request_queue. However, this “queue”is just the concept, different I/O schedulers have different types of queues to deal with requests. For example, in CFQ, each process which has requests to be dispatched has a correspond queue. So, request_queue actually is not queue, it is just a data structure which is used to describe the behavior of I/O schedulers, for example, in this data structure, it defines all kinds of operations for requests, such as *make_request_fn, *unplug_fn.
So the question is that how do you know whether the request can be merged? The procedure of I/O schedulers is that each I/O scheduler first put the request into its own queue or queues(different I/O schedulers have different queues, I will explain them in detail later), at the same time, I/O scheduler will dispatch these requests from its queue or queues into a dispatch queue. Here, I consider the dispatch queue is the queue of device driver, in another word, the I/O schedulers will dispatch their requests from their own queues to dispatch queue. So, for example, if a new request can be merged to an old request that still exist in the queue of I/O scheduler, this new one can be merged. Next, after each merge, if the request is sync, the routine __generic_unplug_device() will be invoked. What does this routine do? There is a little trick here, you can look at the source code by yourself, the final result is that elevator_dispatch_fn will be invoked. In another word, after receiving a new request from the upper level, the I/O scheduler should dispatch the request to the device driver. This makes sense.
Let's start to look at the structure of I/O schedulers. The I/O scheduler should implement several routines, such as elevator_dispatch_fn and elevator_add_req_fn. The most important routine is elevator_dispatch_fn, from which you can find the algorithm of this I/O scheduler.
NOOP:
there is no algorithm in NOOP, you can consider it as first in first out. Without considering the source code, from the property of FIFO, the algorithm should work in this way, put the request into queue of the I/O scheduler without sort, then I/O scheduler dispatches the requests from the queue to device driver one by one.
static int noop_dispatch(struct request_queue *q, int force)
{
struct noop_data *nd = q->elevator->elevator_data;
if (!list_empty(&nd->queue)) {
struct request *rq;
rq = list_entry(nd->queue.next, struct request, queuelist);
list_del_init(&rq->queuelist);
elv_dispatch_sort(q, rq);
return 1;
}
return 0;
}
static void noop_add_request(struct request_queue *q, struct request *rq)
{
struct noop_data *nd = q->elevator->elevator_data;
list_add_tail(&rq->queuelist, &nd->queue);
}
The routines above are the key components of the NOOP, one is add the request to the queue, the other is dispatch the request to the device driver. The interesting idea here is how to get the data description of noop_data. As you can find the type of elevator_data is void. Actually, the idea is to use pointer to record the memory address. For example, in kernel mode, every process has its own task_struct to describe its own property, please remember, each data structure has a confirmed memory address. Through pointer, we can easily get the data structure in any level. In another word, if I want to do something about block device level in md level, we can use memory address to record the correspondent data structure such as noop_data, cfq_data, as_data, etc.
DEADLINE:
We can also consider it as CSCAN. The idea of this I/O scheduler is very simple, since we have many requests in the “queue”of I/O scheduler, why don't we sort them first and then send them to the device driver, in that way, we can avoid unnecessary movement of the disk head, so that the performance can be improved. In order to achieve this goal, the I/O scheduler maintains two queues, that's why I use quote when I mention the queue. What's the I/O scheduler care about are two issues, one is throughput, the other is starvation. In order to improve the throughput, the deadline sorts the request by its LBA. On the other hand, it is not enough that the deadline just sorts the requests. Starvation means the time for a request waiting to be dispatched is too long. Consider such situation, on one side of the disk, there are many many requests, because the sequence for dispatching of the requests depends on its LBA, in such situation, it might be possible that the wait time of the requests on the other side of the disk is too long. Taking into account of such situation, DEADLINE provides a concept of deadline, every requests has a deadline time to wait, if the time is expired, this request should be dispatched immediately. So, when a request is sent to the I/O scheduler, deadline scheduler will put this request into two queues at the same time, one queue presents for sorted queue in which the requests will be sorted. The other queue presents for the deadline queue, the sequence of the requests in this queue depends on its arrival time. In other word, the request at the head of this queue is always the request that mostly need to be dispatched immediately.
When the I/O scheduler starts to dispatch the requests on the “queue”, it first checks whether the time of the first request at the deadline queue is expired, if it is, the I/O scheduler dispatches it. If it is not, the I/O scheduler selects the first several requests at the head of the sorted queue and dispatches them into the device driver. Once these requests have been dispatched, they should be deleted from the both of the queues in I/O schedulers.