分类: LINUX
2009-07-09 11:06:20
Adaptive readahead was in 2005. The patches have been languishing in the -mm tree for one simple reason: their complexity is at such a level that few people are able to review them in any useful way. The new is a response to a request from Andrew Morton for a simpler patch to help get the merge process going. The new code is indeed simpler, having dispensed with much of the logic found in the full adaptive readahead mechanism.
To a great extent, the on-demand patch reimplements what Linux readahead does currently, but in a simpler and more flexible way. Like the current code, the on-demand patch maintains a "readahead window" consisting of a portion of the file starting with the application's last read. Pages inside the readahead window should already be in the page cache - or, at least, under I/O to get there as soon as possible. The window moves forward as the application reads data from the file.
The current code actually implements two windows, being the "current window" (a set of in-cache pages which includes the application's current position) and the "ahead window," being the pages most recently read in by the kernel. Once the application's position crosses from the current window into the ahead window, a new I/O operation is started to make a new ahead window. In this way, the kernel tries to always keep sufficiently far ahead of the application that the file data will be available when requested.
The on-demand patch, instead, has a single readahead window. Rather than maintain a separate "ahead window," the new readahead code marks a page (using a flag in the page structure) as being at the "lookahead index." When an application reads its way into the marked page, the readahead window is extended and a new I/O operation is started. There is some resistance to the idea of using a page flag, since those bits are perennially in short supply. Andrew Morton has using some more approximate heuristics instead. That approach might occasionally make the wrong decision, but the penalty is low and does not affect the correctness of the system's operation as a whole.
While the on-demand patch appears to do relatively little, it does have the advantage of removing a bunch of complexity from the current readahead code. It is able to make its decisions without the overhead of trying to track events like an attempted readahead of pages which are already in the cache. The checks for sequential access are made less strict as well, causing readahead to stay active in situations where the current code would turn it off. The result, according to posted with the patch, is improvements in application speed between 0.1% and 8% or so - with some performance regressions in some cases. Interestingly, some of the best results come with a benchmark running on a MySQL database, which is not where one would normally expect to see a lot of sequential activity.
This patch set is clearly simple enough to be reviewed; in the absence of any strong objections, it could conceivably be ready for 2.6.23. Then, perhaps, Fengguang can start working on adding some of the more complex logic which makes up the full adaptive readahead mechanism.