1.
原来的版本不能阻止offline attack。
The
integrity measurement architecture (IMA) has been a part of Linux for
roughly a year now—it was merged for 2.6.30—and it can be used to attest
to the integrity of a running Linux system.
In its default
configuration, IMA calculates hash values for executables, files which
are mmap()ed for execution, and files open for reading by root. That
list of hashes is consulted each time those files are accessed anew, so
that unexpected changes can be detected. In addition, IMA can be used
with the trusted platform module (TPM) hardware, which is present in
many systems, to sign a collection of these hash values in such a way
that a remote system can verify that only "trusted" code is running
(remote attestation).
But an attacker could modify the contents
of the disk by accessing it under another kernel or operating system.
That could potentially be detected by the remote attestation, but cannot
be detected by the system itself. EVM sets out to change that.
办法:利用security.evm签名
One of the additions that comes with the EVM patch set is the ,
which maintains the file's integrity measurement (hash value) as an
extended attribute (xattr) of a file. The security.ima xattr is used to
store the hash, which gets compared to the calculated value each time
the file is opened.
just calculates a hash over the extended attributes in the security
namespace (e.g. security.ima, security.selinux, and security.SMACK64),
uses the TPM to sign it, and stores it as the security.evm attribute on
the file. Currently, the key to be used with the TPM signature gets
loaded onto the root keyring by readevmkey, which just prompts for a
password at the console. Because an attacker doesn't have the key, an
offline attack cannot correctly modify the EVM xattr when it changes
file data.
2 Slab allocator of the week: SLUB+Queuing
first :感觉:最初背离slab,现在又向slab靠拢
a single per-CPU queue containing pointers to free objects belonging to the cache。当队列满或empty之前时,以batch为单位填充。
second:用bitmap管理slab中的空闲对象,取代原来的link list, 保证了cache性能。
The
SLUB+Q patches achieve this goal by using a bitmap to track which
objects in a given page are free. If the number of objects which can fit
into a page is small enough, this bitmap can be stored in the page
structure in the system memory map; otherwise it is placed at the end of
the page itself.
1. Btrfs: broken by design?
inline extent 将导致大量的内部碎片,该特性原来是为了提高空间利用率。
相关材料
新一代 Linux 文件系统 btrfs 简介2
Concurrency-managed workqueues and thread priorities见
Overview of concurrency managed workqueue
The
CMWQ work is intended to address a number of problems with current
kernel workqueues. At the top of the list is the proliferation of kernel
threads; current workqueues can, on a large system, run the kernel out
of process IDs before user space ever gets a chance to run. Despite all
these threads, current workqueues are not particularly good at keeping
the system busy; workqueues may contain a backlog of work while the CPU
sits idle. Workqueues can also be subject to deadlocks if locking is not
handled very carefully. As a result, the kernel has grown a number of
workarounds and some competing deferred-work mechanisms.
To
resolve these problems, the CMWQ code maintains a set of worker threads
on each processor; these threads are shared between workqueues, so the
system is not overrun with workqueue-specific threads. The special
scheduler class once used by CMWQ is long gone, but the code still has
hooks into the scheduler which it can use to track which worker threads
are actually executing at any given time. If all workqueue threads on a
CPU have blocked waiting on some resource, and if there is queued work
to do, the CMWQ code will kick off a new thread to work on it. The CMWQ
code can run multiple jobs from the same CPU concurrently - something
the current workqueue code will not do. In this way, the CPU is always
kept busy as long as there is work to be done.
1. Improving lost and spurious IRQ handling
当中断丢失或者假中断过多时,polling反而是有效之策
the necessary response when interrupts go bad is returning to polling.
irq: better lost/spurious irq handling,见
2The state of realtime Linux
需要了解的几个工作
Peter Zijlstra's
3 ARM and defconfig files
2.6.36可能去掉defconfig文件
1 Linux 对rc2以后
patch的态度
I absolutely do NOT want any new code. I want regression fixes, fixes for security issues, and fixes for oopses. Nothing
else. 2 Another OOM killer rewrite
把内存留给正在结束的进程
One
change opens up the kernel's final memory reserves to processes which
are either exiting or are about to receive a fatal signal; that should
allow them to clean up and get out of the way, freeing memory quickly.
考虑allocation domain
Another
prevents the killing of processes which are in a separate memory
allocation domain from the process which hit the OOM condition; killing
those processes is unfair and unlikely to improve the situation.
先杀最坏的子进程
it attempts to pick the child which currently has the highest "badness" score
杀掉短期内大量创建的子进程
A new heuristic which has been added is the "forkbomb penalty."
low memory 耗尽时杀进程起不到什么作用
another
change affects behavior when memory is exhausted in the low memory
zone. So, instead of invoking the OOM killer, low-memory allocation
requests will simply fail unless the __GFP_NOFAIL flag is present.
最有争议之处
The
most controversial part of the patch is a complete rewrite of the
badness() function which assigns a score to each process in the system.
对oom killer的评论
But
best thing I've found to do is just put everything in seperate cgroups
with memory limits set at around 80% by default, so no single thing can
take out the whole system.
“The old way to do this is
with rlimits. I've always set rlimits on vsize of every process -- my
default is half of real memory (there's a bunch of swap space in reserve
too). Before Linux, rlimits (under a different name) were the norm, but
on Linux the default is unlimited and I think I'm the only one who
changes it.
Rlimits have a severe weakness in that a process just
has to fork to get a whole fresh set of limits, but they do catch the
common case of the single runaway process.”
3 Writing a WMI driver - an introduction
是ACPI的扩展。
首先得到GUID,然后根据ACPI_WMI_METHOD和ACPI_WMI_EVENT处理方法和事件
1 Idling ACPI idle
一个主要原因是BIOS靠不住.
Linux no longer depends on ACPI to handle idle state transitions on Nehalem and Atom processors.
见[linux-pm] idle-test patches queued for upstream
2 What comes after suspend blockers
又是一个总结, 把事情做正确真不容易.
3 2.6.35 merge window part 3
The
fsync() member of struct file_operations has lost its struct dentry
pointer argument, which was not used by any implementations
The have been merged, changing how truncate() is handled in the VFS layer.