分类: LINUX
2012-06-22 12:52:28
1
2
· The , implementing Android-style opportunistic suspend (with a different API) has been merged. Associated with this work is a new epoll flag (EPOLLWAKEUP) which causes a wakeup event to be activated, preventing suspend when an event is available for processing.
· The gets the kernel closer to being able to safely run processes as root within a container.
· The tmpfs filesystem now supports hole punching and the SEEK_DATA and SEEK_HOLE lseek() options.
· The removal of old code continues; victims include Microchannel bus support, legacy CRIS RTC drivers, the imxmmc driver, the code, and the mechanism.
hanges visible to kernel developers include:
3
See Documentation/trace/uprobetracer.txt for details
The perf tool has been enhanced to make working with dynamic user-space tracepoints easy.
4
atime的用途:管理员删除一些不常用的邮件以腾出空间。mutt email client 利用atime来判断whether a mailbox contains unread mail. Programs that clean up temporary directories (tmpreaper or tmpwatch, for example)
问题的根源:atime is broken. It turns reads into writes and is generally just nasty.
老问题背景以及解决办法: writing the last-accessed time ("atime") takes up a lot of I/O bandwidth when lots of files are being read; The worst of the atime-related problems have long since been mitigated by moving to the "relatime" mount option by default; relatime only updates atime a maximum of once per day for unchanging files. But now it seems that atime recording can be especially problematic with the btrfs filesystem, and relatime may not help much.
新的问题:snapshot特性的文件系统(如btrfs),首先从root开始snapshot,然后grep整个文件系统,所有的inode都要更新,导致大量空间被使用(COW作用)
解决办法:explicitly mount their filesystems with the "noatime" option.
1
Over 2,500 changesets were pulled into the mainline on the first day, and 4,600 have been merged as of this writing. It looks like it will be an interesting cycle with a lot of new stuff coming in and the removal of a bunch of old cruft. As of this writing, user-visible changes pulled for 3.5 include:
2
nonvolatile memory (NVM) promises bandwidth and latency numbers similar to those offered by dynamic RAM, and that, being cheaper than DRAM, it is likely to be offered in larger sizes than DRAM is.而且 memory would persist across a reboot—or a power-down.
Linux可能在原有的memory接口上进行扩展以支持NVM。
如何利用NVM? 各种cache的存放场所:bcache,page cache,inode cache,journals. Vyacheslav Dubeyko had about how NVM could eliminate system bootstrap entirely and make the concept of filesystems obsolete; instead, everything would just live in a persistent object environment.
3
Perf的历史遗留问题,为了保持ABI浪费了4个字节,3.6有望改变.
1
令牌环没有用户了,将从内核中移除
2
移动介质的ext文件系统(目前还很少,一般是vfat)的uid/gid与local host不匹配的解决办法。When a filesystem is mounted using these options, files retain their ownership on disk, but they appear to be owned by the specified user and group. Existing files cannot have their ownership changed, but new files will be created with the user and group given at mount time.
3
首先,printk转换成record,而api是面向流的,难以处理续行问题。一个办法是追踪不同的信息来源(不同的进程),merge来自相同进程的续行,但是依然无法处理theads之间的race condition。
其次,printk加上了时间戳。 [May12 11:27] foo
[May12 11:28] bar
[ +5.077527] zoot
[ +10.235225] foo
[ +0.002971] bar
[May12 11:29] zoot
[ +0.003081] foo
In other words, events that are relatively far apart in time would be marked with the absolute time with one-minute precision. When things happen more closely in time, the elapsed time between successive events would be printed instead.
4
Bache是SSD-based cache,基于page cahe和hard disk之间。对于读能极大提升性能。但是如果缓存写,将引入很大的复杂性。Write-through情况bcache起不到相应作用,而Wirte back方式中途掉电会要求重启后将SSD未写回的数据写回,导致大量复杂代码。还有不支持barrier导致日志文件系统无法使用该特性。还有DIRECT I/O将导致数据不一致,所以两者是互斥的。
该特性比较复杂,进入mainline可能还需要时间。
1
非常值得一看。
Kathleen Nichols and Van Jacobson have published describing a new network queue management algorithm that, it is hoped, will play a significant role in the solution to the bufferbloat problem.
One of the key insights in the design of CoDel is that there is only one parameter that really matters: how long it takes a packet to make its way through the queue and be sent on toward its destination. And, in particular, CoDel is interested in the minimum delay time over a time interval of interest. If that minimum is too high, it indicates a standing backlog of packets in the queue that is never being cleared, and that, in turn, indicates that too much buffering is going on. So CoDel works by adding a timestamp to each packet as it is received and queued. When the packet reaches the head of the queue, the time spent in the queue is calculated; it is a simple calculation of a single value, with no locking required, so it will be fast.
Less time spent in queues is always better, but that time cannot always be zero. Built into CoDel is a maximum acceptable queue time, called target; if a packet's time in the queue exceeds this value, then the queue is deemed to be too long. But an overly-long queue is not, in itself, a problem, as long as the queue empties out again. CoDel defines a period (called interval) during which the time spent by packets in the queue should fall below target at least once; if that does not happen, CoDel will start dropping packets. Dropped packets are, of course, a signal to the sender that it needs to slow down, so, by dropping them, CoDel should cause a reduction in the rate of incoming packets, allowing the queue to drain. If the queue time remains above target, CoDel will drop progressively more packets. And that should be all it takes to keep queue lengths at reasonable values on a CoDel-managed node.
The target and interval parameters may seem out of place in an algorithm that is advertised as having no knobs in need of tweaking. What the authors have found, though, is that a target of 5ms and an interval of 100ms work well in just about any setting.
there is now available
2 Statistics from the 3.4 development cycle
As of this writing, Linus has merged just over 10,700 changes for 3.4; those changes were contributed from 1,259 developers. The total growth of the kernel source this time around is 215,000 lines.
3
目标是一个kernel启动所有的ARM平台,必须用device tree取代原先所有的board file,目前有很大进展,但是必须保留不支持device tree的平台。
board files have a number of tasks:
1 Some useful perf documentation
posted by Google
2
目前: The way that balancing is done in current kernels is relatively straightforward: the active list is not allowed to grow larger than the inactive list. The inactive > active rule is only enforced during reclaim, we don't mind the list sizes on idle systems.
patch前景未明: the kernel's radix tree implementation already has a concept of that is used to track tmpfs pages while they are swapped out.patch利用"exceptional entries"来记录页面evicted的时间,当触发fault时,就可以知道页面逐出多久,利用该时间来调整active/inactive list的大小。
3
时,依然可以恢复网络连接。大部分在用户态完成,少部分需要内核支持。
最早见
4
为维护用户态程序内核的一个努力,一句话,开始的ABI没弄好后面的兼容性支持害死人。