lwn.net kernel news 2010/7-baozhao-ChinaUnix博客

原上草baozhao.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

baozhao

博客访问： 621356
博文数量： 197
博客积分： 7001
博客等级：大校
技术积分： 2155
用户组：普通用户
注册时间： 2005-02-24 00:29

文章分类

全部博文（197）

网络（2）
updating（7）
数据结构（1）
XEN（11）
ACM专题分类（11）
文史杂俎（4）
程序设计与数据结（1）
教育（4）
系统软件（16）

Windows（1）

UNIX（2）

BSD（0）
ACM竞赛（33）
外语（1）
围棋（1）
涂鸦（2）
VM Technology（12）
IT生活（12）
c/c++（14）
Linux（62）
未分配的博文（3）

文章存档

2022年（1）

2019年（2）

2015年（1）

2012年（100）

2011年（69）

2010年（14）

2007年（3）

2005年（7）

我的朋友

相关博文

lwn.net kernel news 2010/7

分类： LINUX

2010-10-19 21:53:07

1 File creation times
The atime stamp is meant to record the last time that the file was accessed. This information is almost never used, though, and can be quite expensive to maintain; So atime is often disabled on contemporary systems or, at least, rolled back to the infrequently-updated "relatime" mode. Mtime, instead, makes a certain amount of sense; it tells the user when the file was last modified. Modification requires writing to the file anyway, so updating this time is often free, and the information is often useful.

ctime
Users who do not look deeply are likely to interpret ctime as "creation time," but that is not what is stored there; ctime, instead, is updated whenever a file's metadata is changed. The main consumer of this information, apparently, is the venerable dump utility, which likes to know that a file's metadata has changed (so that information must be saved in an incremental backup), but the file data itself has not and need not be saved again.

creation time过去Linux 不支持。
Linux systems do not store that time and provide no interface for applications to access it. Some newer filesystems (Btrfs and ext4, for example) have been designed with space for file creation times. 新增的可能为此提供支持。

最后的问题：should the kernel allow the creation time to be modified? windows 支持，Linux支持的话 Allowing the time to be changed would make it less reliable, but it would also be useful for backup/restore programs which want to restore the original creation time.

2 zcache: a compressed page cache
目前很难进入mainline
cache lives to store compressed copies of pages in memory. It no longer looks like a swap device, though; instead, it is set up as a backing store provider for the framework. Cleancache uses a set of hooks into the page cache and filesystem code; when a page is evicted from the cache, it is passed to Cleancache, which might (or might not) save a copy somewhere. When pages are needed again, Cleancache gets a chance to restore them before the kernel reads them from disk. If Cleancache (and its backing store) is able to quickly save and restore pages, the potential exists for a real improvement in system performance.

Zcache uses LZO to compress pages passed to it by Cleancache; only pages which compress to less than half their original size are stored.
There are a couple of obvious tradeoffs to using a mechanism like zcache: memory usage and CPU time.

The other tradeoff is CPU time: it takes processor time to compress and decompress pages of memory. The cost is made worse by any pages which fail to compress down to less than 50% of their original size

3 Realtime Linux: academia v. reality
Thomas Gleixner 的文章，对学术界的工作提出了一定批评，并说工程师实现都不看研究者的论文。非常值得一看。

The Linux Kernel community has a proven track record of being in disagreement with - and disconnected from - the academic operating system research community from the very beginning. 从Linus的微内核之争开始。

早期Adding realtime response to the kernel 学校项目失败原因

Intrusiveness and maintainability
Complexity of usage
Incompleteness 只解决了部分问题
Lack of interest 没想过把代码进入mainline

对学术界的批评
作者是实现后再阅读各方论文看是否有借助之处，但是发现基本是浪费时间。遗憾之处：
We are solving problems, comparing and contrasting approaches and implementations, but we are either too lazy or too busy to sit down and write a proper paper about it.

IEEE's paywall，I personally consider it as a modern form of robber barony where tax payers have to pay for work which was funded by tax money in the first place. Universities' rankings are influenced by the number of papers written by their members and accepted at a IEEE conferences

Base concepts in research are often several decades old
Research often happens on narrow aspects of an already narrow problem space. 没有从整体考虑
Research often happens on artificial application scenarios.
Research often tries to solve yesterday's problems over and over
Comparing and contrasting research results is almost impossible.
Research and education seem to happen in different universes.

Q: Where can I get more information about the realtime preemption patch ?

A: General information can be found on , , and .

Q: Which technologies in the mainline Linux kernel emerged from the realtime preemption patch?

A: The list includes:

the Generic interrupt handling framework. See: Linux/Documentation/DocBook/genericirq and .
Threaded interrupt handlers, and .
The mutex infrastructure. See: Linux/Documentation/mutex-design.txt
High-resolution timers, including NOHZ idle support. See: Linux/Documentation/timers/highres.txt and .
Priority inheritance support for user space pthread_mutexes. See: Linux/Documentation/pi-futex.txt, Linux/Documentation/rt-mutex.txt, Linux/Documentation/rt-mutex-design.txt, , and this Realtime Linux Workshop paper [PDF].
Robustness support for user-space pthread_mutexes. See: Linux/Documentation/robust-futexes.txt and .
The lock dependency validator, .
The kernel tracing infrastructure, as described in a series of LWN articles: , , , and .
Preemptible and hierarchical RCU, also documented in LWN: , , , and .

1 The ghost of sysfs past
无法去掉错误的接口，Mistakes will happen, but, when they become part of the user-space ABI, they can be difficult to get away from.
2 Fixing writeback from direct reclaim
两个概念
"Writeback" is the process of writing the contents of dirty memory pages back to their backing store, where that backing store is normally a file or swap area.
If, however, a memory allocation request cannot be satisfied from the free list, the kernel may try to reclaim pages directly in the context of the process performing the allocation. Diverting an allocation request into this kind of cleanup activity is called "direct reclaim."

Direct reclaim导致的问题 a 直接调用文件系统代码，导致函数嵌套层次深，xfs 8kb内核栈不够用。b which reclaims pages wherever it can find them, tends to create seek-intensive I/O, hurting the whole system's I/O performance.

Mel Gorman 的：
a If the dirty page is an anonymous (process data) page, writeback happens as before. The reasoning here seems to be that the writeback path for these pages (which will be going to a swap area) will be simpler than it is for file-backed pages;
b For dirty, file-backed pages; direct reclaim will no longer try to write back those pages directly. Instead, it creates a list of the dirty pages it encounters, then hands them over to the appropriate background process for the real writeback work. In some cases (such as when is trying to free specific larger chunks of memory), the direct reclaim code will wait in the hope that the identified pages will soon become free. The rest of the time, it simply moves on, trying to find free pages elsewhere.

3 Adding periods to SCHED_DEADLINE
正在进行的工作，Linux内核目前没有deadline scheduling。
三个参数：
Deadline scheduling does away with priorities, replacing them with a three-parameter tuple: a worst-case execution time (or budget), a deadline, and a period. In essence, a process tells the scheduler that it will require up to a certain amount of CPU time (the budget) by the given deadline, and that the deadline optionally repeats with the given period. So, for example, a video-processing application might request 1ms of CPU time to process the next incoming frame, expected in 10ms, with a 33ms period thereafter for subsequent frames.

目前的做法 the deadline and the period are assumed to be the same
This scheduler works, but, thus far, it takes a bit of a shortcut: in SCHED_DEADLINE, the deadline and the period are assumed to be the same. This simplification makes the "admission test" - the decision as to whether to accept a new SCHED_DEADLINE task - relatively easy. Each process gets a "bandwidth" parameter, being the ratio of the CPU budget to the deadline/period value. As long as the sum of the bandwidth values for all processes on a given CPU does not exceed 1.0, the scheduler can guarantee that the deadlines will be met.

但是如果 period和deadline不一致， admission decision algorithms非常难以实现。目前怎么做前景未定

4 Contiguous memory allocation for drivers
目标处理驱动程序申请大块内存的问题 The contiguous memory allocation (CMA) patches grabs a chunk of contiguous physical memory at boot time (when it's plentiful), then doles it out to drivers in response to allocation requests. Where it differs is mainly in an elaborate mechanism for defining the memory region(s) to reserve and the policies for handing them out.

一个全新的会议，关注Linux的未来技术

1 Kernel development statistics for 2.6.35

最核心的代码变化也不小
The core kernel code — those files that all architectures and users use no matter what their configuration is — comprises 5% of the kernel (by lines of code), and you will find that 5% of the total kernel changes happen in that code. Here is the raw number of changes for the "core" kernel files for the 2.6.35-rc5 release.

Action	Lines	% of all changes
Added	27,550	4.50%
Deleted	7,450	1.90%
Modified	6,847	4.93%

I've broken the kernel files down into six different categories:

core : This includes the files in the init, block, ipc, kernel, lib, mm, and virt subdirectories.
drivers : This includes the files in the crypto, drivers, sound, security, include/acpi, include/crypto, include/drm, include/media, include/mtd, include/pcmcia, include/rdma, include/rxrpc, include/scsi, include/sound, and include/video subdirectories.
filesystems : This includes the files in the fs subdirectory.
networking : This includes the files in the net and include/net subdirectories.
architecture-specific : This includes the files in the arch, include/xen, include/math-emu, and include/asm-generic subdirectories.
miscellaneous : This includes all of the rest of the files not included in the above categories.

Based on these categories, the size of the 2.6.35 kernel is as follows:

Category	% Lines
Core	4.37%
Drivers	57.06%
Filesystems	7.21%
Networking	5.03%
Arch-specific	21.92%
Miscellaneous	4.43%

Redhat在内核开发各个子系统都起着重要作用。

2 A brief history of union mounts
该工作还未进入mainline

3 The USB composite framework

http://blog.felipebalbi.com/2008/05/25/usb-composite-gadget-framework/

The Linux USB composite framework provides a way to add USB devices in a fairly straightforward way. Before the composite framework came along, developers needed to implement all USB requests for each gadget they wanted to add to the system. The framework handles basic USB requests and separates each USB composite function, which allows gadget authors to think in terms of functions rather than low-level interfaces and communication handling.

1 For example, right now I am seeing the following error: [ 658.831697] [drm:edid_is_valid] *ERROR* Raw EDID:
[ 658.831702] 48 48 48 48 50 50 50 50 20 20 20 20 4c 4c 4c 4c HHHHPPPP LLLL

Where do I start with tracking this down?

$ cd linux-2.6
$ git grep "Raw EDID"
drivers/gpu/drm/drm_edid.c: DRM_ERROR("Raw EDID:\n");
$ ./scripts/get_maintainer.pl -f drivers/gpu/drm/drm_edid.c

2 Bcache: Caching beyond just RAM
利用ssd 做cache

use one or more solid-state storage devices (SSDs) to cache block data (hence bcache, a block device cache).

page cache==>ssd==>normal disk
The design of bcache allows the use of more than one SSD to perform caching. It is also possible to cache more than one existing filesystem, or choose instead to just cache a small number of performance-critical filesystems.

Another potential use is using local media to cache remote disks.

Implementation To intercept filesystem operations, bcache hooks into the top of the block layer, in __generic_make_request(). It thus works entirely in terms of BIO structures. By hooking into the sole function through which all disk requests pass, bcache doesn't need to make any changes to block device naming or filesystem mounting. This approach of intercepting bio requests in the background allows us to start and stop caching on the fly

阅读(1215) | 评论(0) | 转发(0) |

上一篇：switch_to

下一篇：Kernel Stack(2.6.34)

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6