全部博文(183)
分类: LINUX
2010-02-05 16:03:22
CFQ
CFQ is the default I/O scheduler in the current Linux. Its full name is Completed Fairness Queuing. In my opinion, the most important work for the CFQ is to keep every process has the same fairness to use the I/O bandwidth. Here we do not discuss the processes with different priority. Since CFQ is developed after AS(I am not sure, just guess), if AS is because of anticipation, CFQ is also a kind of anticipation. Let's describe it in general way first. There is a parameter named slice_sync(async is out of our discuss area), you can consider it as time slice. In order to keep fairness, every process should have the same time to make use of I/O bandwidth. So, the processes are inserted into a round-robin list, each process has a time slice to use I/O, one after another. However, in practice, there are always many exceptions need to be considered. First, Suppose during a time slice of a process, there is no I/O request, should the I/O scheduler begin to serve another process or wait? If wait, how long should the process be waited for? Also, still in a time slice, even if there are request from this process, but the accession pattern of this process is very random, the interval distance between each requests is very very large, should this process continue to be served? The mechanism of CFQ to wait is almost the same as AS. Let's discuss it in detail.
The first data structure is service_tree, as I have said, the processes should be put into a round-robin like list. In practice, this is red-black tree. I think this makes sense, the left most element is the first to be served, and the right most element is the last. So, service_tree is actually a red black tree in which contains all processes. The data structure for each process is cfqq, in another word, the requests from different processes should be inserted into their own queues. Each process has its own queue. And these queues are inserted into service_tree of cfqd. Interestingly, in each cfqq, the algorithm is almost the same as AS. As you might remember, there are two queues in AS, one is sorted queue, the other is expired queue. The queue of cfqq is the same, it is actually two queues, one is cfqq->sort_list, the other is cfqq->fifo queue. In order to describe it skimpily, I just use one queue to explain. The mechanism is the same with AS for each cfqq. In order to simplify the procure, I use my own way to describe how to add a new request into cfq “queue”.
/*
When the CFQ receives a new request, this routine will be invoked.
*/
static void cfq_insert_request(struct request_queue *q, struct request *rq)
{
….
cfq_add_rq_rb(rq); // Add the request to cfqq's sort_list queue
list_add_tail(&rq->queuelist, &cfqq->fifo); // Add the request to cfqq's fifo queue
cfq_rq_enqueued(cfqd, cfqq, rq); // Update think_time and seek_time, the same as AS
}
static void cfq_add_rq_rb(struct request *rq)
{
/*
* looks a little odd, but the first insert might return an alias.
* if that happens, put the alias on the dispatch list
*/
// elv_rb_add is the routine which adds the request to the sort queue of cfqq
while ((__alias = elv_rb_add(&cfqq->sort_list, rq)) != NULL)
cfq_dispatch_insert(cfqd->queue, __alias);
// if cfqq itself is not in service_tree of cfqd, add it to this tree.
if (!cfq_cfqq_on_rr(cfqq))
cfq_add_cfqq_rr(cfqd, cfqq);
/*
* check if this request is a better next-serve candidate
*/
// almost the same algorithm as AS, you can get it easily by reading the source code.
cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
BUG_ON(!cfqq->next_rq);
}
Since we have discussed the insight mechanism of AS, it is much easier for us to understand what the CFQ has done. We pass the anticipation part of CFQ since it is the same as AS. Now, we start to discuss how cfq selects requests to dispatch.
Let's see the main source code of it:
static int cfq_dispatch_requests(struct request_queue *q, int force)
{
….
while ((cfqq = cfq_select_queue(cfqd)) != NULL) {
….
dispatched += __cfq_dispatch_requests(cfqd, cfqq, max_dispatch);
}
….
return dispatched;
}
From the source code above, it is very easy for us to understand the flow of CFQ. The I/O scheduler first selects a cfqq, then, dispatches the requests on the queue to device driver, here I don't list the source code of __cfq_dispatch_requests, because it is easy to understand by reading the source code. There is a parameter named max_dispatch, the default value is 4, which means at one time, the maximum number of requests of a cfqq to be dispatched into device driver is 4. If the requests are just sync, actually, there is at most one request in the cfqq. Anyway, the time for dispatching just 4 requests is very short, it is usually several ms(I guess, may be less), but the default slice_sync is 100, which means the time slice for each process is 100ms, which is much longer than several ms. So, after each small period of dispatching, the CFQ should decide which queue should be served next. Let's analysis this insight mechanism.
/*
the decision for which cfqq to be dispatched is made from this routine.
*/
static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
{
struct cfq_queue *cfqq;
/*
If there is no active_queue, try to select a new queue, actually, get the left most cfqq from service_tree of cfqd.
*/
cfqq = cfqd->active_queue;
if (!cfqq)
goto new_queue;
/*
* The active queue has run out of time, expire it and select new.
*/
if (cfq_slice_used(cfqq))
goto expire;
/*
* The active queue has requests and isn't expired, allow it to
* dispatch.
*/
// there is at least one request in the queue, keep the queue
if (!RB_EMPTY_ROOT(&cfqq->sort_list))
goto keep_queue;
/*
* No requests pending. If the active queue still has requests in
* flight or is idling for a new request, allow either of these
* conditions to happen (or time out) before selecting a new queue.
*/
// timer_pending means there is a queue which is doing anticipation
// cfqq->dispatched means there is at least one request being serving in device driver
// Like AS, from evaluating ttime_mean, the enable_idle is set.
if (timer_pending(&cfqd->idle_slice_timer) ||
(cfqq->dispatched && cfq_cfqq_idle_window(cfqq))) {
cfqq = NULL;
goto keep_queue;
}
expire:
cfq_slice_expired(cfqd, 0);
new_queue:
cfqq = cfq_set_active_queue(cfqd);
keep_queue:
return cfqq;
}
Finally, let's answer the questions:
The CFQ has the same mechanism to do anticipation like AS in a single cfqq. So, during a time slice, if there is no request from a cfqq, the CFQ might wait for a moment, it depends on the ttime_mean just the same like AS.
Whenever there is a new request to the CFQ, CFQ will check whether it is worth to preempt the original cfqq, a routine called cfq_should_preempt just do this job. There are some check conditions, one of them is seek_mean. In other word, if the original cfqq does not have access locality, which means seek_mean is very large, this cfqq might be preempted by other cfqq.