分类:
2012-09-13 10:26:48
1
主要是一些全局性策略的讨论,而不是具体的技术细节.
目前内核缺乏one or more kernel regression trackers. 可能会放弃一些old/oddball architectures(包括一些ARM平台), tool chains, and devices的支持. Both KVM and Xen are under development for ARM, but neither has gotten to the point of being merged.
1
背景:the scheduler did have power-aware logic from 2.6.18 through 3.4. That code was removed in 3.5 because it never worked very well and nobody was putting any effort into improving it.
目前进行一些讨论,但是依然有很长的一段路要走。 Alex Shi started off the conversation with on how power awareness might be added back to the scheduler.
原来的方案是尽可能把载荷集中在某些CPU上,以便让某些CPU节能。但是Alex 认为应该尽可能摊派任务,使得所有的CPU同时空闲才能节能。支持的理由有只要有cpu工作,其他的部件如memory controller不可能节能。x86系统只有所有的CPU进入idle才能最大程度节能。
建议调度器有
"power" and "performance"有种模式, In a performance-oriented mode, the scheduler might balance tasks
more aggressively, trying to keep the load the same on all processors. In a
power-savings mode, processes might stay a bit more tightly packed onto a
smaller number of CPUs, especially processes that have an observed history of
running for very short periods of time.
需要解决的问题有处理architecture-specific的参数,可能加入。另一个则是a better understanding of
process behavior; the almost-ready may help in this regard.
2
利用GCC 特性生成更快的内核。 最近的GCC支持"link-time
optimization" (LTO),The idea behind LTO is to examine the entire program after the
individual files have been compiled and exploit any additional optimization
opportunities that appear. The most significant of those opportunities appears
to be the inlining of small functions across object files. The compiler can
also be more aggressive about detecting and eliminating unused code and data.
Andi's have the
same basic scope: ensuring that the compiler knows that specific symbols are
needed even if they appear to be unused; that prevents the LTO stage from
optimizing them away,比如某些exported symbol可能不被内核调用但是可能被外部调用。
LTO可能会带来更好的性能,但是编译时间大幅增长,需要更多的硬件资源,很少有内核开发者愿意测试,这是一个不利之处。
3 Ask a kernel developer: maintainer workflow
介绍了作为内核子系统维护者的工作流程。
1
Tejun Heo, who recently posted changing an aspect of workqueue behavior ,以前是重入的,现在是不可重入的,因为重入很难保证 various "flush" operations 的正确性。 All workqueues become non-reentrant, and aspects of the API related to reentrant behavior have been simplified. There is, evidently, a slight performance cost to the change, but Tejun says it is small
2
问题背景:Runtime-loadable
firmware在suspend/resume如何处理, firmware tends to live on
disk, and the actual firmware loading operation involves the running of a
helper process in user space. Neither the disk nor user space are guaranteed to
be available at the point in the resume process when a given device wants its
firmware back;
解决办法:: cache firmware
blobs, but only during the actual suspend/resume cycle.Ming's patch,makes this
process automatic and transparent,不需要每个驱动程序接入.
3
本地进程通信的特点:类似TCP/IP的一系列开销都是多余的,That is why many
programs have been written specifically to use Unix-domain sockets when
communicating with local peers.
一个TCP/IP协议栈本地通信的改进:
objective of from
Bruce Curtis. The idea is simple enough to explain: when both endpoints of a
TCP connection are on the same machine, the two sockets are marked as being
"friends" in the kernel. Data written to such a socket will be
immediately queued for reading on the friend socket, bypassing the network
stack entirely.
patch带来的好处:
If it is merged, the result should be faster local communications between
processes without the need for special-case code using Unix-domain sockets. It
could also be most useful on systems hosting containerized guests where
cross-container communications are needed; one suspects that Google's use case
looks somewhat like that.
4
非常值得读的一篇文章,已经单独写了一篇博文。
1
l The block I/O bandwidth controller has been reworked so that each control group has its own request list, rather than working from a single, global list.
l A set of has been added in an attempt to improve security;
2
Mel Gorman 一大堆测试表明新的硬件优化得比较好,而老的硬件有不少regressions. 处于维护状态的ext3相比而ext4和XFS regression也多一些。
3
内核有50多处独立实现的hash表,Sasha Levin is trying to do with his .目前还未进入mainline
What is the best way to get configuration data into a driver
简单的情况用sysfs。for more complex types of configurations, the best thing to use is configfs (kernel documentation, ), which was written specifically for this task. It handles ways to tie configurations to sysfs devices easily, and handles notifying drivers when things have been changed by the user. At this point in time, I strongly recommend using that interface for any reasonably complex configuration task that a driver or subsystem might need.
如何compile a custom kernel for a system?
To use this option, first boot the distribution kernel, and plug in any devices that you expect to use on the system, which will load the kernel drivers for them. Then go into your kernel source directory, and run "make localmodconfig". That option will dig through your system and find the kernel configuration for the running kernel (which is usually at /proc/config.gz, but can sometimes be located in the boot partition, depending on the distribution). Then, the script will remove all options for kernel modules that are not currently loaded, stripping down the number of drivers that will be built significantly. The resulting configuration file will be written to the .config file, and then you can build the kernel and install it as normal. The time to build this stripped-down kernel should be very short, compared to the full configuration that the distribution provides.
1
l Btrfs has also gained the ability to apply disk quotas to subvolumes.
l The new "coupled cpuidle" code enables better CPU power management on systems where CPUs cannot be powered down individually. See for more information on how this feature works.
l The patch set has been merged, making the placement of swap files on NFS-mounted filesystems a not entirely insane thing to do.
l The new __GFP_MEMALLOC flag allows memory allocations to dip into the emergency reserves.
l The IRQF_SAMPLE_RANDOM interrupt flag no longer does anything; it has been removed from the kernel.
2
#define (x) (*(volatile typeof(x) *)&(x))
ACCESS_ONCE防止编译器进行不必要的优化导致错误发生。
其适用场合:It is only in places where shared data is accessed without locks (or explicit barriers) that a construct like ACCESS_ONCE() is required. Scalability pressures are causing the creation of more of this type of code, but most kernel developers still should not need to worry about ACCESS_ONCE() most of the time.
3
值得一读,实现者的文章
CP Fast Open (TFO) feature, which allows the elimination of one round time trip (RTT) from certain kinds of TCP conversations. 即在三次握手的时候就发送数据。为了保证安全性,首先通过常规的三次握手获得TFO cookie,然后在以后的TCP连接中使用TFO,TFO具体的步骤如下
i. The client TCP sends a SYN that contains both the TFO cookie (specified as a TCP option) and data from the client application.
ii. The server TCP validates the TFO cookie by duplicating the encryption process based on the source IP address of the new SYN. If the cookie proves to be valid, then the server TCP can be confident that this SYN comes from the address it claims to come from. This means that the server TCP can immediately pass the application data to the server application.
iii. From here on, the TCP conversation proceeds as normal: the server TCP sends a SYN-ACK segment to the client, which the client TCP then acknowledges, thus completing the three-way handshake. The server TCP can also send response data segments to the client TCP before it receives the client's ACK.
要使用TFO需要改变客户端和服务端程序。Currently, with the IETF. Linux is the first operating system that is adding support for TFO.