lwn.net kernel news 2011/11-baozhao-ChinaUnix博客

原上草baozhao.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

baozhao

博客访问： 622658
博文数量： 197
博客积分： 7001
博客等级：大校
技术积分： 2155
用户组：普通用户
注册时间： 2005-02-24 00:29

文章分类

全部博文（197）

网络（2）
updating（7）
数据结构（1）
XEN（11）
ACM专题分类（11）
文史杂俎（4）
程序设计与数据结（1）
教育（4）
系统软件（16）

Windows（1）

UNIX（2）

BSD（0）
ACM竞赛（33）
外语（1）
围棋（1）
涂鸦（2）
VM Technology（12）
IT生活（12）
c/c++（14）
Linux（62）
未分配的博文（3）

文章存档

2022年（1）

2019年（2）

2015年（1）

2012年（100）

2011年（69）

2010年（14）

2007年（3）

2005年（7）

我的朋友

相关博文

lwn.net kernel news 2011/11

分类： LINUX

2011-12-09 23:29:09

is a kernel module that adds steganographic encryption to the device mapper. "Steganographic" means that the encrypted data is hidden to the point that its very existence can be denied.

Open vSwitch is a network switch; at its lowest level, it is concerned with routing packets between interfaces. It is aimed at virtualization users, so, naturally, it is used in the creation of virtual networks. A switch can be set up with a number of virtual network interfaces, most of which are used by virtual machines to communicate with each other and the wider world. These virtual networks can be connected across hosts and across physical networks. One of the key features of Open vSwitch appears to be the ability to easily migrate virtual machines between physical hosts and have their network configuration (addresses, firewall rules, open connections, etc.) seamlessly follow.

目前的问题是has a traffic control system of its own; duplicating that infrastructure is not a popular idea. 还难以立即进入mainline

新的硬件：

OMAP4 processors have an onboard face detection module that can be used for camera focus control, "face unlock" features, and more.

问题：The was submitted by Tom Leiming 难以进入内核。rather than being implemented as a standalone device, face detection be integrated into the Video4Linux2 framework.

其中讨论的一个重点是如何构造合适的ABI

Trying to anticipate where this kind of hardware will go in an attempt to create the perfect ABI from the outset seems like an exercise in futility.

Bigalloc

因为paging机制page的大小，内核目前不可能增加block size

The "bigalloc" patch set adds the concept of "block clusters" to the filesystem; rather than allocate single blocks, a filesystem using clusters will allocate them in larger groups. Mapping between these larger blocks and the 4KB blocks seen by the core kernel is handled entirely within the filesystem.The cluster size to use is set by the system administrator at filesystem creation time (using a development version of e2fsprogs)

Clustering reduces the space overhead of the block bitmaps and other management data structures. But, as Ted Ts'o documented back in July, it can also increase performance in situations where large files are in use. Block allocation times drop significantly, but file I/O performance also improves in general as the result of reduced on-disk fragmentation.

Inline data

淘宝的工作。on-disk inodes can be set when a filesystem is created. The default size is 256 bytes, but the on-disk structure (struct ext4_inode) only requires about half of that space. 剩下的部分is normally used to hold extended attributes.

Tao Ma's may change that situation. The idea is quite simple: very small files can be stored directly in the space between inodes without the need to allocate a separate data block at all. On filesystems with 256-byte on-disk inodes, the entire remaining space will be given over to the storage of small files. If the filesystem is built with larger on-disk inodes, only half of the leftover space will be used in this way, leaving space for late-arriving extended attributes that would otherwise be forced out of the inode.

Metadata checksumming

Darrick Wong's 目标确保 filesystem metadata is correct，但不处理data. This patch attachs checksums to the various data structures found on an ext4 filesystem - superblocks, bitmaps, inodes, directory indexes, extent trees, etc. - and verifying that the checksums match the data read from the filesystem later on. A checksum failure can cause the filesystem to fail to mount or, if it happens on a mounted filesystem, remount it read-only and issue pleas for help to the system log.

1 Drivers as documentation

对驱动开发人员的建议，datasheet不一定靠得住，要把这些在驱动中记录下来。

Define descriptive names for registers, bits, and fields rather than putting in hard-coded constants. Note features that are incompletely described, incorrectly described, or entirely science-fictional. Comment operations that have non-obvious ordering requirements or that do not play well together. And, in general, code with a great deal of sympathy for the people who will have to make changes to your work in the future. Some hardware can never be properly documented because the relevant information is simply not available;

背景：A typical system-on-chip (SOC) will have hundreds of pins (electrical connectors) on it. Many of those pins have a well-defined purpose: supplying power or clocks to the processor, video output, memory control, and so on. But many of these pins - again, possibly hundreds of them - will have no single defined purpose. 导致的问题：Pin configuration is typically done as part of the board-specific startup code; the system-specific nature of that code prevents a kernel built for one device from running on another even if the same processor is in use. Pin configuration also tends to involve a lot of cut-and-pasted, duplicated code;

解决方案：The idea behind the pin control subsystem is to create a centralized mechanism for the management and configuration of multi-function pins, replacing a lot of board-specific code. This subsystem is quite thoroughly documented in Documentation/pinctrl.txt.

The , currently in its third revision, attempts to bring the details of pin configuration into the pin controller core.

动机：

overall performance can be improved if the program doing the caching has a say in what gets removed from the cache. A recent patch from John Stultz attempts to make it easier for applications to offer up caches for reclamation when memory gets tight.

实现：

John's takes a lot of inspiration from the device implemented for Android by Robert Love. In particular, an application can mark a range of pages in an open file as "volatile" with the POSIX_FADV_VOLATILE operation. Pages that are so marked can be discarded by the kernel if memory gets tight. Crucially, even dirty pages can be discarded - without writeback - if they have been marked volatile. This operation differs from POSIX_FADV_DONTNEED in that the given pages will not (normally) be discarded right away - the application might want the contents of volatile pages in the future, but it will be able to recover if they disappear. 但涉及core MM,前途未卜.

背景：if a process is attempting to allocate a transparent huge page, 将进行。会导致如下问题：But writeback to a slow device plays poorly with compaction; the memory management code cannot migrate a page that is being written back until the I/O operation completes. When synchronous compaction encounters such a page, it will go to sleep waiting for the I/O on that page to complete. If the page is headed to a slow device, and it is far back on a queue of many such pages, that sleep can go on for a long time.

One should not forget that producing a single huge page can involve migrating hundreds of ordinary pages. So once that long sleep completes, the job is far from done; the process stuck performing compaction may find itself at the back of the writeback queue quite a few times before it can finally get its page fault resolved.

解决方案：未决，一个方法是关闭synchronous compaction

The patch allows administrators to associate an alias name for a particular disk by writing to the /sys/block//alias sysfs file. That way, certain log messages can be made using the user-supplied disk name rather than the raw name of the disk, which may change on each boot.

上次讨论的时候就觉得不应该merge。

背景：

the ARM architecture has its own implementation of the DMA API

工作：

Marek Szyprowski's is to hook ARM into the common DMA mapping framework. That enables the deletion of a certain amount of duplicated code and its replacement with common code. Among other things, this work simplifies the handling of differences within the ARM architecture itself. Through the use of the common struct dma_map_ops, an architecture can provide a set of mapping operations specific to a given situation - different devices can have different DMA operations, for example.

一些特定的ARM DMA操作通过dma_attrs属性依然使用通用的接口。

experience：The first of those is that one should always use existing APIs whenever possible. Every developer thinks they can do something better; that may or may not be true, but using the common code works out better in the long run. But, he said, developers should not be afraid of extending core interfaces when the need arises. That is how problems get solved and how the core gets better. The final lesson was "expect it to take some time" when one has to solve problems of this nature.

问题:

Current kernels have no such support, making it easy for local users to execute denial-of-service attacks by filling up /tmp or /dev/shm.

目前的patch无法进入mainline

Davidlohr's does not actually implement quotas; instead, it adds a new resource limit (RLIMIT_TMPFSQUOTA) controlling how much space a user can occupy on all mounted tmpfs systems. it has some appeal because tmpfs is not a persistent filesystem. Normal filesystem implementations store quotas on the filesystem itself, but tmpfs cannot do that.

反对的原因

Developers would rather see tmpfs behave like other filesystems.

l The device mapper has a new "thin provisioning" capability which, among other things, offers improved snapshot support. See Documentation/device-mapper/thin-provisioning.txt for information on how it works.

l There is a new memory-mapped virtio device intended to allow virtualized guests to use virtio-based block and network devices in the absence of PCI support.

l The patch set has been merged; that should improve writeback performance for a number of workloads.

l The new GENHD_FL_NO_PART_SCAN device flag suppresses the normal partition scan when a new block device is added to the system.

l The venerable block layer function __make_request() has been renamed to blk_queue_bio() and exported to modules.

l The TAINT_OOT_MODULE taint flag is now set when out-of-tree modules are inserted into the kernel.

l A few macros (EXPORT_SYMBOL_* and THIS_MODULE) have been split out of and placed in . Code that only needs to export symbols can now use the latter include file; the result is a reduction in kernel compile time.

3 Better device power management for 3.2

目前不仅仅是对CPU进行电源管理

The 3.2 kernel will have a new set of APIs intended to allow drivers to let the system find the best operating level for the devices they manage.

There are three separate pieces to the dynamic voltage and frequency scaling (DVFS) API,本文介绍的就是API的使用.

目前内核有四种faster IPC机制: Cross Memory Attach (CMA) and kernel-dbus (kdbus), . Meanwhile, yet another kernel module - "binder" used by the Android platform.

本文目标:

by exposing and contrasting the different solutions and the problems they address, we can take a step closer to finding unifying solutions that address both today's needs and the needs for our grandchildren.

1 ’

The industrial I/O (IIO) subsystem provides a framework for drivers that deal with all kinds of sensors that measure quantities like voltages, temperatures, acceleration, ambient light, and more.

IIO sensors vary a lot, from simple, low-bandwidth sensors to complex, high-bandwidth devices. The initial IIO move is aimed at the first set. For this kind of sensor, the user-space interface is expected to live entirely in sysfs, under /sys/bus/iio/devices. Each device entry will have a number of attributes; some, like name and sampling_frequency, will be present for all sensors. Others will depend on what the sensor actually measures; the attempts to standardize the names of those attributes wherever possible.

The most significant user-visible changes merged for 3.2 include:

The TCP stack now supports , an algorithm which allows for faster recovery after transient network problems.
for disk devices has been added to the block layer.
The subsystem, which uses the trusted platform module to protect a system against offline modifications to files, has been merged.
The , allowing an administrator to set maximum CPU usage for groups of processes, has been merged. See the documentation file for information on how to use this feature.
has been added for object-storage devices. This is the third RAID 5 implementation in the kernel, with another (for btrfs) due to arrive in the near future.
The facility, meant to provide for fast interprocess messaging, is now in the mainline. 
The mremap() system call now works properly with transparent huge pages, reducing the number of page-split operations.
The x86 architecture has gained an SSSE3-optimized implementation of the SHA1 hash algorithm. Optimized implementations of Blowfish and Twofish have been added as well.
There is a new user-space configuration interface for the crypto layer.
Support for the "Hexagon" DSP-based architecture has been merged; see for more information.
DVFS ("dynamic voltage and frequency scaling") is a new mechanism for controlling devices that can operate at multiple voltage and frequency values, trading off between power consumption and performance as required. It is analogous to the cpufreq governor mechanism used for the CPU.

Changes visible to kernel developers include:

The new "pin control subsystem" allows developers on embedded systems to configure the many multi-purpose pins found on contemporary system-on-chip processors. See Documentation/pinctrl.txt for the details.
The new module_platform_driver() macro can eliminate a bunch of boilerplate code for simple platform drivers.
The power management quality-of-service API has grown a new capability for the management of per-device QOS constraints; it is intended to be used with the new DVFS subsystem. See Documentation/power/pm_qos_interface.txt for details on this API.

传说中的btrfsck依然没有出现,但是有几个相关的小工具出现

Frontswap的pull request遭到痛批,目前无法merge

阅读(982) | 评论(0) | 转发(0) |

上一篇：世说的词语

下一篇：Lessons on development of 64-bit C/C++ applications

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6