1 File creation times
The atime stamp is meant to record the last time that the file was
accessed. This information is almost never used, though, and can be
quite expensive to maintain; So atime is often disabled on contemporary
systems or, at least, rolled back to the infrequently-updated "relatime"
mode. Mtime, instead, makes a certain amount of sense; it tells the
user when the file was last modified. Modification requires writing to
the file anyway, so updating this time is often free, and the
information is often useful.
ctime
Users who do not look
deeply are likely to interpret ctime as "creation time," but that is not
what is stored there; ctime, instead, is updated whenever a file's
metadata is changed. The main consumer of this information, apparently,
is the venerable dump utility, which likes to know that a file's
metadata has changed (so that information must be saved in an
incremental backup), but the file data itself has not and need not be
saved again.
creation time过去Linux 不支持。
Linux systems do not
store that time and provide no interface for applications to access it.
Some newer filesystems (Btrfs and ext4, for example) have been designed
with space for file creation times. 新增的可能为此提供支持。
最
后的问题:should the kernel allow the creation time to be modified? windows
支持,Linux支持的话 Allowing the time to be changed would make it less
reliable, but it would also be useful for backup/restore programs which
want to restore the original creation time.
2 zcache: a compressed page cache
目前很难进入mainline
cache
lives to store compressed copies of pages in memory. It no longer looks
like a swap device, though; instead, it is set up as a backing store
provider for the
framework. Cleancache uses a set of hooks into the page cache and
filesystem code; when a page is evicted from the cache, it is passed to
Cleancache, which might (or might not) save a copy somewhere. When pages
are needed again, Cleancache gets a chance to restore them before the
kernel reads them from disk. If Cleancache (and its backing store) is
able to quickly save and restore pages, the potential exists for a real
improvement in system performance.
Zcache uses LZO to compress
pages passed to it by Cleancache; only pages which compress to less than
half their original size are stored.
There are a couple of obvious tradeoffs to using a mechanism like zcache: memory usage and CPU time.
The
other tradeoff is CPU time: it takes processor time to compress and
decompress pages of memory. The cost is made worse by any pages which
fail to compress down to less than 50% of their original size
3 Realtime Linux: academia v. reality
Thomas Gleixner 的文章,对学术界的工作提出了一定批评,并说工程师实现都不看研究者的论文。非常值得一看。
The
Linux Kernel community has a proven track record of being in
disagreement with - and disconnected from - the academic operating
system research community from the very beginning. 从Linus的微内核之争开始。
早期Adding realtime response to the kernel 学校项目失败原因
- Intrusiveness and maintainability
- Complexity of usage
- Incompleteness 只解决了部分问题
- Lack of interest 没想过把代码进入mainline
对学术界的批评
作者是实现后再阅读各方论文看是否有借助之处,但是发现基本是浪费时间。遗憾之处:
We
are solving problems, comparing and contrasting approaches and
implementations, but we are either too lazy or too busy to sit down and
write a proper paper about it.
IEEE's paywall,I personally
consider it as a modern form of robber barony where tax payers have to
pay for work which was funded by tax money in the first place.
Universities' rankings are influenced by the number of papers written by
their members and accepted at a IEEE conferences
- Base concepts in research are often several decades old
- Research often happens on narrow aspects of an already narrow problem space. 没有从整体考虑
- Research often happens on artificial application scenarios.
- Research often tries to solve yesterday's problems over and over
- Comparing and contrasting research results is almost impossible.
- Research and education seem to happen in different universes.
Q: Where can I get more information about the realtime preemption patch ?
A: General information can be found on , , and .
Q: Which technologies in the mainline Linux kernel emerged from the realtime preemption patch?
A: The list includes:
- the Generic interrupt handling framework. See: Linux/Documentation/DocBook/genericirq and .
- Threaded interrupt handlers, and .
- The mutex infrastructure. See: Linux/Documentation/mutex-design.txt
- High-resolution timers, including NOHZ idle support. See: Linux/Documentation/timers/highres.txt and .
-
Priority inheritance support for user space pthread_mutexes. See:
Linux/Documentation/pi-futex.txt, Linux/Documentation/rt-mutex.txt,
Linux/Documentation/rt-mutex-design.txt, , and this Realtime Linux Workshop paper [PDF].
- Robustness support for user-space pthread_mutexes. See: Linux/Documentation/robust-futexes.txt and .
- The lock dependency validator, .
- The kernel tracing infrastructure, as described in a series of LWN articles: , , , and .
- Preemptible and hierarchical RCU, also documented in LWN: , , , and .
1 The ghost of sysfs past
无法去掉错误的接口,Mistakes will happen, but, when they become part of the user-space ABI, they can be difficult to get away from.
2 Fixing writeback from direct reclaim
两个概念
"Writeback"
is the process of writing the contents of dirty memory pages back to
their backing store, where that backing store is normally a file or swap
area.
If, however, a memory allocation request cannot be satisfied
from the free list, the kernel may try to reclaim pages directly in the
context of the process performing the allocation. Diverting an
allocation request into this kind of cleanup activity is called "direct
reclaim."
Direct reclaim导致的问题 a 直接调用文件系统代码,导致函数嵌套层次深,xfs
8kb内核栈不够用。b which reclaims pages wherever it can find them, tends to
create seek-intensive I/O, hurting the whole system's I/O performance.
Mel Gorman 的 :
a
If the dirty page is an anonymous (process data) page, writeback
happens as before. The reasoning here seems to be that the writeback
path for these pages (which will be going to a swap area) will be
simpler than it is for file-backed pages;
b For dirty, file-backed
pages; direct reclaim will no longer try to write back those pages
directly. Instead, it creates a list of the dirty pages it encounters,
then hands them over to the appropriate background process for the real
writeback work. In some cases (such as when
is trying to free specific larger chunks of memory), the direct reclaim
code will wait in the hope that the identified pages will soon become
free. The rest of the time, it simply moves on, trying to find free
pages elsewhere.
3 Adding periods to SCHED_DEADLINE
正在进行的工作,Linux内核目前没有deadline scheduling。
三个参数:
Deadline
scheduling does away with priorities, replacing them with a
three-parameter tuple: a worst-case execution time (or budget), a
deadline, and a period. In essence, a process tells the scheduler that
it will require up to a certain amount of CPU time (the budget) by the
given deadline, and that the deadline optionally repeats with the given
period. So, for example, a video-processing application might request
1ms of CPU time to process the next incoming frame, expected in 10ms,
with a 33ms period thereafter for subsequent frames.
目前的做法 the deadline and the period are assumed to be the same
This scheduler works, but, thus far, it takes a bit of a shortcut: in
SCHED_DEADLINE, the deadline and the period are assumed to be the same.
This simplification makes the "admission test" - the decision as to
whether to accept a new SCHED_DEADLINE task - relatively easy. Each
process gets a "bandwidth" parameter, being the ratio of the CPU budget
to the deadline/period value. As long as the sum of the bandwidth values
for all processes on a given CPU does not exceed 1.0, the scheduler can
guarantee that the deadlines will be met.
但是如果 period和deadline不一致, admission decision algorithms非常难以实现。目前怎么做前景未定
4 Contiguous memory allocation for drivers
目
标处理驱动程序申请大块内存的问题 The contiguous memory allocation (CMA) patches grabs a
chunk of contiguous physical memory at boot time (when it's plentiful),
then doles it out to drivers in response to allocation requests. Where
it differs is mainly in an elaborate mechanism for defining the memory
region(s) to reserve and the policies for handing them out.
一个全新的会议,关注Linux的未来技术
1 Kernel development statistics for 2.6.35
最核心的代码变化也不小
The
core kernel code — those files that all architectures and users use no
matter what their configuration is — comprises 5% of the kernel (by
lines of code), and you will find that 5% of the total kernel changes
happen in that code. Here is the raw number of changes for the "core"
kernel files for the 2.6.35-rc5 release.
Action | Lines | % of all changes |
---|
Added | 27,550 | 4.50% |
Deleted | 7,450 | 1.90% |
Modified | 6,847 | 4.93% |
I've broken the kernel files down into six different categories:
- core : This includes the files in the init, block, ipc, kernel, lib, mm, and virt subdirectories.
- drivers
: This includes the files in the crypto, drivers, sound, security,
include/acpi, include/crypto, include/drm, include/media, include/mtd,
include/pcmcia, include/rdma, include/rxrpc, include/scsi,
include/sound, and include/video subdirectories.
- filesystems : This includes the files in the fs subdirectory.
- networking : This includes the files in the net and include/net subdirectories.
- architecture-specific : This includes the files in the arch, include/xen, include/math-emu, and include/asm-generic subdirectories.
- miscellaneous : This includes all of the rest of the files not included in the above categories.
Based on these categories, the size of the 2.6.35 kernel is as follows:
Category | % Lines |
---|
Core | 4.37% |
Drivers | 57.06% |
Filesystems | 7.21% |
Networking | 5.03% |
Arch-specific | 21.92% |
Miscellaneous | 4.43% |
Redhat在内核开发各个子系统都起着重要作用。
2 A brief history of union mounts
该工作还未进入mainline
3 The USB composite framework
http://blog.felipebalbi.com/2008/05/25/usb-composite-gadget-framework/The
Linux USB composite framework provides a way to add USB devices in a
fairly straightforward way. Before the composite framework came along,
developers needed to implement all USB requests for each gadget they
wanted to add to the system. The framework handles basic USB requests
and separates each USB composite function, which allows gadget authors
to think in terms of functions rather than low-level interfaces and
communication handling.
1
For example, right now I am seeing the following error: [ 658.831697] [drm:edid_is_valid] *ERROR* Raw EDID:
[ 658.831702] 48 48 48 48 50 50 50 50 20 20 20 20 4c 4c 4c 4c HHHHPPPP LLLL
Where do I start with tracking this down? $ cd linux-2.6
$ git grep "Raw EDID"
drivers/gpu/drm/drm_edid.c: DRM_ERROR("Raw EDID:\n");
$ ./scripts/get_maintainer.pl -f drivers/gpu/drm/drm_edid.c
2 Bcache: Caching beyond just RAM
利用ssd 做cache
use one or more solid-state storage devices (SSDs) to cache block data (hence bcache, a
block device
cache).
page cache==>ssd==>normal disk
The
design of bcache allows the use of more than one SSD to perform
caching. It is also possible to cache more than one existing filesystem,
or choose instead to just cache a small number of performance-critical
filesystems.
Another potential use is using local media to cache remote disks.
Implementation
To intercept filesystem operations, bcache hooks into the top of the
block layer, in __generic_make_request(). It thus works entirely in
terms of BIO structures. By hooking into the sole function through which
all disk requests pass, bcache doesn't need to make any changes to
block device naming or filesystem mounting. This approach of
intercepting bio requests in the background allows us to start and stop
caching on the fly