Chinaunix首页 | 论坛 | 博客
  • 博客访问: 591077
  • 博文数量: 197
  • 博客积分: 7001
  • 博客等级: 大校
  • 技术积分: 2155
  • 用 户 组: 普通用户
  • 注册时间: 2005-02-24 00:29












2012-09-13 10:26:48



目前内核缺乏one or more kernel regression trackers. 可能会放弃一些old/oddball architectures(包括一些ARM平台), tool chains, and devices的支持. Both KVM and Xen are under development for ARM, but neither has gotten to the point of being merged.



背景:the scheduler did have power-aware logic from 2.6.18 through 3.4. That code was removed in 3.5 because it never worked very well and nobody was putting any effort into improving it.

目前进行一些讨论,但是依然有很长的一段路要走。 Alex Shi started off the conversation with on how power awareness might be added back to the scheduler.

原来的方案是尽可能把载荷集中在某些CPU上,以便让某些CPU节能。但是Alex 认为应该尽可能摊派任务,使得所有的CPU同时空闲才能节能。支持的理由有只要有cpu工作,其他的部件如memory controller不可能节能。x86系统只有所有的CPU进入idle才能最大程度节能。
建议调度器有 "power" and "performance"有种模式, In a performance-oriented mode, the scheduler might balance tasks more aggressively, trying to keep the load the same on all processors. In a power-savings mode, processes might stay a bit more tightly packed onto a smaller number of CPUs, especially processes that have an observed history of running for very short periods of time.
需要解决的问题有处理architecture-specific的参数,可能加入。另一个则是a better understanding of process behavior; the almost-ready may help in this regard.


利用GCC 特性生成更快的内核。 最近的GCC支持"link-time optimization" (LTO)The idea behind LTO is to examine the entire program after the individual files have been compiled and exploit any additional optimization opportunities that appear. The most significant of those opportunities appears to be the inlining of small functions across object files. The compiler can also be more aggressive about detecting and eliminating unused code and data.

Andi's have the same basic scope: ensuring that the compiler knows that specific symbols are needed even if they appear to be unused; that prevents the LTO stage from optimizing them away
,比如某些exported symbol可能不被内核调用但是可能被外部调用。


3 Ask a kernel developer: maintainer workflow




Tejun Heo, who recently posted changing an aspect of workqueue behavior ,以前是重入的,现在是不可重入的,因为重入很难保证 various "flush" operations 的正确性。 All workqueues become non-reentrant, and aspects of the API related to reentrant behavior have been simplified. There is, evidently, a slight performance cost to the change, but Tejun says it is small



问题背景:Runtime-loadable firmwaresuspend/resume如何处理, firmware tends to live on disk, and the actual firmware loading operation involves the running of a helper process in user space. Neither the disk nor user space are guaranteed to be available at the point in the resume process when a given device wants its firmware back;
解决办法:: cache firmware blobs, but only during the actual suspend/resume cycle.Ming's patch,makes this process automatic and transparent,不需要每个驱动程序接入.



本地进程通信的特点:类似TCP/IP的一系列开销都是多余的,That is why many programs have been written specifically to use Unix-domain sockets when communicating with local peers.
objective of from Bruce Curtis. The idea is simple enough to explain: when both endpoints of a TCP connection are on the same machine, the two sockets are marked as being "friends" in the kernel. Data written to such a socket will be immediately queued for reading on the friend socket, bypassing the network stack entirely.
If it is merged, the result should be faster local communications between processes without the need for special-case code using Unix-domain sockets. It could also be most useful on systems hosting containerized guests where cross-container communications are needed; one suspects that Google's use case looks somewhat like that.




l         The block I/O bandwidth controller has been reworked so that each control group has its own request list, rather than working from a single, global list.

l         A set of has been added in an attempt to improve security;



Mel Gorman 一大堆测试表明新的硬件优化得比较好,而老的硬件有不少regressions. 处于维护状态的ext3相比而ext4XFS regression也多一些。



内核有50多处独立实现的hash表,Sasha Levin is trying to do with his .目前还未进入mainline


4 Ask a kernel developer

What is the best way to get configuration data into a driver

简单的情况用sysfsfor more complex types of configurations, the best thing to use is configfs (kernel documentation, ), which was written specifically for this task. It handles ways to tie configurations to sysfs devices easily, and handles notifying drivers when things have been changed by the user. At this point in time, I strongly recommend using that interface for any reasonably complex configuration task that a driver or subsystem might need.


如何compile a custom kernel for a system?

To use this option, first boot the distribution kernel, and plug in any devices that you expect to use on the system, which will load the kernel drivers for them. Then go into your kernel source directory, and run "make localmodconfig". That option will dig through your system and find the kernel configuration for the running kernel (which is usually at /proc/config.gz, but can sometimes be located in the boot partition, depending on the distribution). Then, the script will remove all options for kernel modules that are not currently loaded, stripping down the number of drivers that will be built significantly. The resulting configuration file will be written to the .config file, and then you can build the kernel and install it as normal. The time to build this stripped-down kernel should be very short, compared to the full configuration that the distribution provides.



l         Btrfs has also gained the ability to apply disk quotas to subvolumes.

l         The new "coupled cpuidle" code enables better CPU power management on systems where CPUs cannot be powered down individually. See for more information on how this feature works.

l         The patch set has been merged, making the placement of swap files on NFS-mounted filesystems a not entirely insane thing to do.

l         The new __GFP_MEMALLOC flag allows memory allocations to dip into the emergency reserves.

l         The IRQF_SAMPLE_RANDOM interrupt flag no longer does anything; it has been removed from the kernel.



#define (x) (*(volatile typeof(x) *)&(x))


其适用场合:It is only in places where shared data is accessed without locks (or explicit barriers) that a construct like ACCESS_ONCE() is required. Scalability pressures are causing the creation of more of this type of code, but most kernel developers still should not need to worry about ACCESS_ONCE() most of the time.




CP Fast Open (TFO) feature, which allows the elimination of one round time trip (RTT) from certain kinds of TCP conversations. 即在三次握手的时候就发送数据。为了保证安全性,首先通过常规的三次握手获得TFO cookie,然后在以后的TCP连接中使用TFOTFO具体的步骤如下

         i.              The client TCP sends a SYN that contains both the TFO cookie (specified as a TCP option) and data from the client application.

       ii.            The server TCP validates the TFO cookie by duplicating the encryption process based on the source IP address of the new SYN. If the cookie proves to be valid, then the server TCP can be confident that this SYN comes from the address it claims to come from. This means that the server TCP can immediately pass the application data to the server application.

      iii.            From here on, the TCP conversation proceeds as normal: the server TCP sends a SYN-ACK segment to the client, which the client TCP then acknowledges, thus completing the three-way handshake. The server TCP can also send response data segments to the client TCP before it receives the client's ACK.

要使用TFO需要改变客户端和服务端程序。Currently, with the IETF. Linux is the first operating system that is adding support for TFO.

阅读(1507) | 评论(0) | 转发(0) |