lwn.net kernel news 2012/6-baozhao-ChinaUnix博客

原上草baozhao.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

baozhao

博客访问： 622602
博文数量： 197
博客积分： 7001
博客等级：大校
技术积分： 2155
用户组：普通用户
注册时间： 2005-02-24 00:29

文章分类

全部博文（197）

网络（2）
updating（7）
数据结构（1）
XEN（11）
ACM专题分类（11）
文史杂俎（4）
程序设计与数据结（1）
教育（4）
系统软件（16）

Windows（1）

UNIX（2）

BSD（0）
ACM竞赛（33）
外语（1）
围棋（1）
涂鸦（2）
VM Technology（12）
IT生活（12）
c/c++（14）
Linux（62）
未分配的博文（3）

文章存档

2022年（1）

2019年（2）

2015年（1）

2012年（100）

2011年（69）

2010年（14）

2007年（3）

2005年（7）

我的朋友

相关博文

lwn.net kernel news 2012/6

分类： LINUX

2012-08-05 15:05:20

关于利用QR码来显示内核崩溃信息的讨论，前景未明。其特点是compress a fair amount of data into a form that can be digested elsewhere.

The 碰到如下问题： when the kernel outputs a partial message (by passing a string to printk() that does not end with a newline), the logging system will buffer the text until the rest of the message arrives.

If a driver does

printk("testing the frobnozzle ...");

do_test();

printk(" OK\n");

and do_test() hangs up,

如果处理buffer带来的麻烦还没有形成一致意见

3 Tightening security: not for the impatient

安全补丁进入内核的困难旅程，长达十多年。

Consider the classic symbolic link vulnerability, wherein an attacker fools a privileged program into writing to a file behind an attacker-controlled symbolic link. Such vulnerabilities can be exploited to overwrite files that the attacker would not otherwise have access to.

Kees Cook to deal with this class of vulnerabilities. It is based on the observations that symbolic link vulnerabilities almost always involve links placed in /tmp, and that /tmp has the "sticky" bit set in any contemporary distribution. Given that:

The solution is to permit symlinks to only be followed when outside a sticky world-writable directory, or when the uid of the symlink and follower match, or when the directory owner matches the symlink's owner.

So Kees thinks that his current (a variant of one we have ) should be considered for merging, finally. The patches implement the symbolic link restrictions, but also add a new rule for hard links: a hard link to a file can only be created if the user owns the file or has write access to it. Once again, this change eliminates a class of attacks, but at a small cost: older versions of the "at" daemon break unless a small patch is applied.

另外一个漏洞 On Linux systems, there is a sysctl knob (suid_dumpable) that controls whether a crashing setuid process generates a core dump or not. Setting it to a non-zero value allows core dumps to happen; setting it to two applies certain restrictions that are intended to make it safe. But, Kees says, that's not the case;见

存储设备的新机制，OS可以它通知firmware优化。
"contexts" are a small number added to I/O requests that are intended to help the device optimize the execution of those requests. They are meant to differentiate different types of I/O, keeping large, sequential operations separate from small, random requests. I/O can be placed into a "large unit" context, where the operating system promises to send large requests and, possibly, not attempt to read the data back until the context has been closed.

但是如何实现没有达成共识
对于flash device ， The effect of such an implementation would be to concentrate data written under any one context into the same erase block(s).

Paolo Bonzini recently posted making a couple of changes to msync(),但想被merge不容易，因为可能改变应用程序的行为，虽然应用程序不一定正确

目前MS_ASYNC正是内核缺省的行为，patch想立即发起I/O
There are a few options to msync(), one of which (MS_ASYNC) asks that the writeback of modified pages be "scheduled," but not necessarily completed immediately. It is meant to be a non-blocking system call that sets the necessary actions in motion, but does not wait for them to complete. Current kernels will write back dirty pages as part of the normal writeback process; the system behaves, in other words, as if msync(MS_ASYNC) were being called on a regular basis on every mapping. Writeback of dirty pages is already scheduled as soon as the page is dirtied. Given that, there's not much work for an explicit MS_ASYNC call from user space to do, and, indeed, the kernel essentially ignores such calls.

下面的变化也不容易merge
msync() takes two parameters indicating the offset and length of the memory area to be written back. But the kernel has always ignored those parameters, choosing instead to just write back all modified pages in the file, and the related metadata as well. Paolo's patch changes the implementation to only synchronize the specific pages requested by the user.

3 Proposals for Kernel Summit discussions

Kernel Summit的参会人员选拔机制：
Those interested in attending are being asked to describe the technical expertise they will bring to the meeting, as well as to suggest topics for discussion.
从目前来看，议题更注重Linux内核开发的生态环境，技术议题少一些
There tends to be a focus on more process and social aspects of the kernel at the summit, mostly because the hardcore technical topics are generally better handled by a more focused group. The summit tries to address global concerns, and there seem to be plenty to choose from

一个新的字符串操作接口，字符以字长为单位操作，而不是传统的单个字符为单位。但是增加复杂性，只适合字符串操作频繁的场合。展示C语言位操作技巧的一篇文章。

老问题 The "holy grail" is a single kernel binary that will boot on any ARM device。

有四个方面的努力： Cleaning up and consolidating the header files within the various ARM is one, while consolidating ARM drivers is another. In addition, device tree will provide a way to specify the differences between ARM SoCs at runtime. Finally, doing active maintenance of the ARM tree, keeping in mind the big picture, will also help.

ARM's big.LITTLE architecture is an example of asymmetric multiprocessing where all CPUs are instruction-set compatible, but where different CPUs have very different performance and energy-efficiency characteristics.

早期的工作a . This approach is termed “big.LITTLE Switcher. b These big.LITTLE systems were therefore the subject of a scheduler minisummit at last February's Linaro Connect in the Bay Area which was .

一个重要任务是completely eliminate the overhead of per-kthread creation, teardown, and migration. Thomas posted a that moves idle-task creation to common code. This patchset has been well received thus far, and went into mainline during the 3.5 merge window. Thomas has since followed up with a new that allows kthreads to be parked and unparked. The new kthread_create_on_cpu(), kthread_should_park(), kthread_park(), and kthread_unpark() APIs can be applied to the per-CPU kthreads that are now created and destroyed on each CPU-hotplug cycle.

另外一个有趣的地方是Add minimal support to scheduler for asymmetric cores。There has been great progress in a number of areas. First, Paul Turner posted a new version of his . This patchset should allow the scheduler to make better (and more power-aware) task-placement and load-balancing decisions. Second, Morten Rasmussen ran some experiments (including experimental patches) on top of Paul Turner's patchset. See below for more information. Third, Peter Zijlstra posted a of removing sched_mc and also posted an proposing increased scheduler awareness of hardware topology. This should allow the scheduler to better handle asymmetric architectures such as ARM's big.LITTLE architecture. Finally, Juri Lelli posted an of a prototype

A process's directory under /proc now includes a children file containing the IDs of its child processes.
The kcmp() system call has been added. Its purpose is to help user space checkpoint/restore utilities to determine whether two processes share a given resource or not; see for a description of the interface.
Also for checkpoint/restore: the prctl() system call has gained options to set the beginning and end of the argv and environment areas and the executable file a process is running.
The "frontswap" mechanism, part of the family of technologies, sneaked its way into the mainline just after the -rc1 release.

The task_work_add() function, useful for requesting that a function be run in the context of a specific process, has been added. See for a description of the task_work_add() API.
struct inode_operations has a new update_time() function whose job is to provide any needed special handling for changes to any of the file timestamps. The file_update_time() prototype has been changed: it now returns an int that can indicate that the operation failed. Failures to update the last-access time are now explicitly ignored; this is done to ensure that atime update failures don't make the filesystem unreadable.

红黑树的用户must provide their own functions for inserting nodes into the tree and performing searches; There is some appeal to being able to hand-code the search and insertion functions, but there would also be value in generic implementations. 目前有两个竞争方案。

A "volatile range" is a set of pages in memory containing data that might be useful to an application at some point in the future; a key point is that, if the need arises, the application is able to reacquire (or regenerate) that data from another source.

用法：
放弃该区域
fallocate(fd, FALLOCATE_FL_MARK_VOLATILE, offset, len);

After the call completes, the kernel is not obligated to keep that range in memory, and is not obligated to write that range to backing store before reclaiming it.

如果真正要使用该区域：
fallocate(fd, FALLOCATE_FL_UNMARK_VOLATILE, offset, len);

If the indicated range is still present in memory, the call will return zero and the application can proceed to work with the data. If, instead, any part of the range has been purged by the kernel since it was marked volatile, a non-zero return value will inform the application that it needs to find that data somewhere else.

阅读(1139) | 评论(0) | 转发(0) |

上一篇：DMA缓冲略说

下一篇：Weird number和Signed overflow

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6