分类: LINUX
2011-07-24 01:01:47
The answer is that the kernel can be made to behave that way by tweaking a runtime parameter, but it is not necessarily a good idea. Before getting into that, however, it's worth noting that recent 2.6 kernels have a memory management problem which can cause serious problems after an application which reads through entire filesystems (updatedb, say, or a backup) has run. The problem is the slab cache's tendency to request allocations of multiple, contiguous pages; these allocations, when done at the behest of filesystem code, can bring the system to a halt. has been merged which fixes this particular problem for 2.6.6.
The bigger issue remains, however: should the kernel swap out user applications in order to cache more file contents? There are plenty of arguments in favor of this behavior. Quite a few large applications set up big areas of memory which they rarely, if ever use. If application memory is occasionally forced to disk, the unused parts will remain there, and that much physical memory will be freed for more useful contents. Without swapping application memory to disk and seeing what gets faulted back in, it is almost impossible to figure out which pages are not really needed. A large file cache is also a performance enhancer. The speedups that come from having frequently-accessed data in memory are harder to see than the slowdowns caused by having to fault in a large application, but they can lead to better system throughput overall.
Still, there are users who insist that, for example, a system backup should never force OpenOffice out to disk. They don't care how quickly a system maintenance application runs at 3:00 in the morning, but they care a lot about how the system responds when they are at the keyboard. This wish was expressed repeatedly until Andrew Morton :
This helped quiet the debate as the parties involved looked more closely at this particular parameter. Or, perhaps, it was just fear of Andrew's singing. Either way, it has become clear that most people are unaware of what the "swappiness" parameter does; the fact that it has never been documented may have something to do with that.
So... swappiness, which is exported to /proc/sys/vm/swappiness, is a parameter which sets the kernel's balance between reclaiming pages from the page cache and swapping out process memory. The reclaim code works (in a very simplified way) by calculating a few numbers:
With those numbers in hand, the kernel calculates its "swap tendency":
swap_tendency = mapped_ratio/2 + distress + vm_swappiness;
If swap_tendency is below 100, the kernel will only reclaim page cache pages. Once it goes above that value, however, pages which are part of some process's address space will also be considered for reclaim. So, if life is easy, swappiness is set to 60, and distress is zero, the system will not swap process memory until it reaches 80% of the total. Users who would like to never see application memory swapped out can set swappiness to zero; that setting will cause the kernel to ignore process memory until the distress value gets quite high.
The swappiness parameter should do what a lot of users want, but it does not solve the whole problem. Swappiness is a global parameter; it affects every process on the system in the same way. What a number of people would like to see, however, is a way to single out individual applications for special treatment. Possible approaches include using the process's "nice" value to control memory behavior; a low-priority process would not be able to push out significant amounts of a high-priority process's memory. Alternatively, the VM subsystem and the scheduler could become more tightly integrated. The scheduler already makes an effort to detect "interactive" processes; those processes could be given the benefit of a larger working set in memory. That sort of thing is 2.7 work, however; in the mean time, people who are unhappy with the kernel's swap behavior may want to try playing with the knobs which have been provided. ( to post comments)
How do you set 'swappiness'?
Posted May 6, 2004 16:19 UTC (Thu) by southey (subscriber, #9466) []
As a ordinary user (with root access), how do you actually set swappiness? Especially every reboot. Also, what performance problems would be expected? I would there just be intense disk swapping when required.
How do you set 'swappiness'?
Posted May 6, 2004 16:27 UTC (Thu) by corbet (editor, #1) []
To set it to zero, type:echo 0 > /proc/sys/vm/swappiness
All there is to it.
How do you set 'swappiness'?
Posted May 6, 2004 17:32 UTC (Thu) by southey (subscriber, #9466) []
Many thanks, I'll give it a whirl. It is not always clear what to do for less technical user. This is one of the best sections on Linux (web or print based) that allows at least me to understand what the kernel is and what is it doing in the past, present and future!
How do you set 'swappiness'?
Posted May 6, 2004 17:38 UTC (Thu) by thomas_d_stewart (subscriber, #4328) []
And if you want it set at every reboot try:-echo "vm/swappiness=0" >> /etc/sysctl.conf
(Thats how to do it in debian and fedora, its part of the procpc package)
HTH
--
Tom
How do you set 'swappiness'?
Posted May 13, 2004 22:55 UTC (Thu) by ArsonSmith (guest, #5695) []
edit /etc/sysctl.confadd:
vm.swappiness =
replace
Many people say just to add an echo
but this wont persist after a reboot. sysctl is a utility provided by most distributions to set this up after reboot. You can also see what all the configurable peramiters are but running
sysctl -A
I am a fan of sysctl as it also keeps your runtime kernel configuration stuff in a central location /etc/sysctl.conf and not in various places /etc/init.d/kernel_custom_stuff or /etc/rc5.d/local or what ever other places people like to make up to put this kind of stuff.
How do you set 'swappiness'?
Posted May 13, 2004 22:56 UTC (Thu) by ArsonSmith (guest, #5695) []
Sorry that should be value 0-100 not 1-100
2.6 swapping behavior
Posted May 6, 2004 17:05 UTC (Thu) by guinan (subscriber, #4644) []
I've felt this - it is highly annoying when I come down to continue my work in the morning, and each of the 8 tabs in my Galeon window takes 10 seconds to page back in because updatedb and various other things that have no business leaving pages in the cache ran overnight.I will try setting swappiness to 0, but why couldn't the kernel let processes provide a hint about page cache policy themselves? It would take a while for applications to catch up, but it keeps policy in userspace, on a per-application basis, instead of leaving it up to heuristics in the kernel. Examples,
updatedb,makewhatis,etc. - DISPOSABLE
galeon,evolution,etc. - INTERACTIVE
Use an ioctl(), /proc/self/ entry, whatever.
-Jamie
2.6 swapping behavior
Posted May 6, 2004 19:31 UTC (Thu) by abatters (subscriber, #6932) []
Well, there is mlock() et al., but that would be like trying to swat a fly with a sledgehammer.After closing a memory-hog program that was causing swapping, sometimes I just do "swapoff -a; swapon -a" to get the system responsive again.
mlock
Posted May 8, 2004 17:44 UTC (Sat) by giraffedata (subscriber, #1954) []
Actually, mlock is conceptually exactly what's required here. We're talking about a case where the following assumption inherent in real memory allocation policy fails: the pages for which fast access will be most appreciated are those that were most recently used.Here, we have a user who is willing to let 32MB of memory sit idle overnight, even at the cost of slowing down other things, just so he can have immediate response every time he clicks his web browser. That's what mlock is about.
I do a similar (but rather opposite) thing with a ramdisk. I copy various files that are used in tasks that I want to be responsive into a ramdisk. Ramdisk is just file cache that is locked in memory. That way, no matter how much memory pressure there has been since the last time I used these files, they're always right there when I click for them.
2.6 swapping behavior
Posted May 13, 2004 16:50 UTC (Thu) by jonsmirl (guest, #7874) []
What about simply adding "swapoff -a; swapon -a" to the end of updatedb and prelink chron scripts?
Speculative swap-in?
Posted May 6, 2004 17:45 UTC (Thu) by Ross (subscriber, #4065) []
On an otherwise idle system with large amounts of free or cache-only
2.6 swapping behavior
Posted May 6, 2004 17:50 UTC (Thu) by xorbe (guest, #3165) []
"Without swapping application memory to disk and seeing what gets faulted back in, it is almost impossible to figure out which pages are not really needed."Oh come on.
You mark the app's pages inaccessible. When the app touches it, the OS notes that the app really does have permissions, changes, and resumes the app. Pages that are never touched after a while can be dropped to swap.
2.6 swapping behavior
Posted May 6, 2004 18:21 UTC (Thu) by corbet (editor, #1) []
"You mark the app's pages inaccessible. When the app touches it, the OS notes that the app really does have permissions, changes, and resumes the app. Pages that are never touched after a while can be dropped to swap.
That sounds vaguely like what the 2.4 VM did. It works, but you have to mess around with a lot of page table entries, keep track of which pages you have invalidated (in affected process's page tables), and know when to get around to cleaning them up.
To an extent, things are pretty much still done that way, actually; pages are pulled from pages tables and put into the inactive list. Eventually they find their way to swap. If some process wants them in the mean time, they are soft-faulted back in.
2.6 swapping behavior
Posted May 6, 2004 18:42 UTC (Thu) by Duncan (guest, #6647) []
Some weeks ago, as I was reading about yet more machinations and hoops the
2.6 swapping behavior
Posted May 6, 2004 20:47 UTC (Thu) by thyrsus (subscriber, #21004) []
In olden days, the sticky bit on binary executables gave the kernel a hint that it should avoid swapping/paging out the memory for that executable. Might that still be appropriate today?
2.6 swapping behavior
Posted May 7, 2004 5:12 UTC (Fri) by maney (subscriber, #12630) []
I'm afraid you've been misinformed. The sticky bit told the kernel not to purge the executable from swap when it wasn't running (until there was no non-sticky swap available to avoid an out of memory panic, of course). IIRC, this actually goes back to the days when it wasn't swapping as we know it, but the wholesale paging of an app's executable memory in one chunk. (recall that on the PDP-11, executable space was less than 64KB maximum, and the memory management didn't support page swapping anyway)So the traditional use of the sticky bit is actually rather the opposite of what's wanted here! It's also less than clear that attaching the swap me only under duress property statically to the source file is the best choice even if it turns out to be practical to prioritize non-cache pages at that granularity. One obvious complication (that also wasn't present in the PDP-11 paging model) is shared libraries.
2.6 swapping behavior
Posted May 6, 2004 21:47 UTC (Thu) by iabervon (subscriber, #722) []
I think the issue is really that stuff used for a minute five hours ago is preferred to stuff used for an hour six hours ago. Stuff that's of lasting significance is more likely to be needed again after a period of the system being idle, although it may be good to evict while the system is busy.Ideally, things would get swapped out while updatedb ran, and then swapped back in when nothing had used the memory cached for updatedb. But it wouldn't just be program memory getting swapped back in; it would be clever to pull into cache files and directories that get used a lot, so that (for example), your Mozilla cache would be in memory again when you got up.
2.6 swapping behavior
Posted May 7, 2004 7:59 UTC (Fri) by njhurst (guest, #6022) []
I don't understand why updatedb needs so much cache memory? Surely it only needs to keep a stack of inodes from root to the current point in the filesystem in memory. Once it has looked at a file that file's memory should be returned to the pool immediately. I don't know how to force the kernel to do this though.(This is obviously updatedb specific information, but maybe it would be easier to fix updatedb than everything else?)
2.6 swapping behavior
Posted May 7, 2004 21:53 UTC (Fri) by addw (guest, #1771) []
Trouble is that the kernel doesn't know that the updatedb is not going to look at those files ever again (well, 'till it runs again tomorrow). But the blocks from the file system are left in memory on the grounds that something recently used it likely to be used again in the near future.Simple prediction doesn't always work.
2.6 swapping behavior
Posted May 13, 2004 0:19 UTC (Thu) by njhurst (guest, #6022) []
I agree, my point is just that maybe some thought could be put into making updatedb more well behaved, rather than trying to get that behaviour directly out of the kernel?I think it is allowable to have user space programs try to optimise their behaviour with the kernel :)
2.6 swapping behavior
Posted May 14, 2004 11:17 UTC (Fri) by forthy (guest, #1525) []
IMHO the initial priority of a just-allocated or just-loaded buffer is too
2.6 swapping behavior
Posted May 11, 2004 1:28 UTC (Tue) by mcelrath (guest, #8094) []
updatedb (and many other applications) need to be using O_DIRECT or some other flag that indicates explicitly that files will be read exactly once, and putting the file in the buffer cache isn't necessary.There is no way for the kernel to predict that some process named 'updatedb' will read every file exactly once, but another process named 'mozilla' likes to read the same file over and over. It's up to the application to specify that.
AFAIK O_DIRECT is not the appropriate flag for this, because read/write buffers must be page aligned to use it. An O_NOCACHE flag has been proposed before (especially by streaming video folks) but has not been added, though I did see an implementation once. I think an O_NOCACHE or O_READONCE is the solution to this...
2.6 swapping behavior
Posted May 13, 2004 21:00 UTC (Thu) by jzhao (guest, #2865) []
Robert Love had a patch which does exactly this: