分类: LINUX
2011-09-29 16:32:13
1
jump label allows the optimization of "highly unlikely" code branches to the point that their normal overhead is close to zero. This speedup is done with runtime code patching; that is also the cost: enabling or disabling the unlikely case is an expensive operation. Thus, jump label is best used for code which is almost never enabled; tracepoints and statements are obvious cases.
There were a number of complaints about the initial jump label implementation, including the fact that it was somewhat awkward to use. In response, has been posted which changes the interface considerably. One starts by declaring a "jump key":
#include
struct jump_label_key my_key;
Enabling and disabling the key is a simple matter of calling:
jump_label_inc(struct jump_label_key *key);
jump_label_dec(struct jump_label_key *key);
And using the key to control the execution of rarely-needed code becomes:
if (static_branch(&my_key)) {
/* Unlikely stuff happens here */
}
In the absence of full jump label support, a jump key is represented by an atomic_t value. jump_label_inc() becomes atomic_inc(), jump_label_dec() becomes atomic_dec(), and static_branch() is implemented with atomic_read(). If jump label is configured into the kernel, enabling and disabling a jump key become heavier operations, while static_branch() becomes nearly free. For the intended use cases for jump labels, that is a worthwhile tradeoff.
2
古老的APM影响cpuidle的改进,决定只保留最基本的功能,其他的去掉
3
l Beginning support has been merged. User namespaces are a sort of container where processes can safely be given root access within the container without being able to affect the rest of the system. Full container support is a long-term project, but the user namespace patches get the kernel one step closer.
l It is now possible for a suitably privileged process to write to a processes /proc/pid/mem file.
l The , intended to allow the system to export information about the topology of complex media subsystems to user space, has been merged.
l printk() and friends have a new "%pB" format specifier which prints a backtrace symbol and its offset.
l Some low-level interrupt-related functions have changed names:
Old |
New |
get_irq_chip() |
irq_get_chip() |
get_irq_chip_data() |
irq_get_chip_data() |
get_irq_msi() |
irq_get_msi_desc() |
irq_data_get_irq_data() |
irq_data_get_irq_handler_data() |
set_irq_chained_handler() |
irq_set_chained_handler() |
set_irq_chip() |
irq_set_chip() |
set_irq_chip_and_handler_name() |
irq_set_chip_and_handler_name() |
set_irq_data() |
irq_set_handler_data() |
set_irq_handler() |
irq_set_handler() |
set_irq_nested_thread() |
irq_set_nested_thread() |
set_irq_noprobe() |
irq_set_noprobe() |
set_irq_type() |
irq_set_irq_type() |
set_irq_wake() |
irq_set_irq_wake() |
4 Dynamic devices and static configuration
The OMAP-based "USB-attached" network port引发的问题
传统的做法是利用platform_data,但是usb系统不支持。
The traditional approach is through the creation of "board files"; see as an example. These files are meant to provide the kernel with enough information to understand the topology of the hardware it is running on; information related to specific devices is typically passed through a set of static platform_device structures, and through that structure's platform_data pointer in particular. As the driver initializes the device, it can refer to the platform_data pointer (which points to some sort of device-specific structure) for any information which it cannot get from the hardware itself.
一个较好的做法是device tree,但远水不解近渴。
5
local users的大量fork如何处理
starts with the addition of a new process tracking structure. It is organized as a simple tree reflecting the actual family structure of the processes on the system. It differs from existing data structures, though, in that this "history tree" persists even when some processes exit.
history tree定期更新。
如何检查fork bomb的发生?
see if there have been any memory allocation stalls or kswapd runs since the last check. It also looks at whether the total number of processes on the system has increased.
如何处理
Enter the fork bomb killer, which is invoked by the OOM killer. The fork bomb killer will perform a depth-first traversal of the process history tree, filling in each node with information on the total number of processes below that node and the total memory used by those processes. At the end, the process with the highest score is examined; if there are at least ten processes in the history below the high scorer, it is deemed to be a fork bomb; that process and all of its descendants will be killed.
1
见
2
old functions like simple_strtoul() will silently ignore junk at the end of an integer value, so "100xx" successfully converts to an unsigned integer type. Alternatives like strict_strtoul() have been encouraged instead, but they have problems too, including the lack of overflow checks. So what's a kernel hacker to do?
As of 2.6.39, there is a new set of string-to-integer converters which is expected to be used in preference to all others.
3
Some of the more significant user-visible changes include:
· The mechanism has been merged. Ipset allows the creation of groups of IP addresses, port numbers, and MAC addresses in a way which can be quickly matched in iptables rules.
· The size of the initial congestion window in the TCP stack has been increased, a change which should lead to shorter latencies for the loading of web pages and other server-oriented tasks. See for details.
· There is a new system call:
· int syncfs(int fd);
It behaves like sync() with the exception that only the filesystem containing fd will be flushed to persistent storage.
· The USB core has gained support for USB 3.0 hubs.
· The core has been added to the staging tree. Along with it came "zcache," a compressed in-memory caching mechanism.
· There is a new "multi-queue priority scheduler" queueing discipline in the networking layer which enables the offloading of quality-of-service processing work to suitably capable hardware.
· The and the Stochastic Fair Blue scheduler have been added to the networking code.
· Support for the UniCore 32-bit RISC architecture has been merged.
Changes visible to kernel developers include:
· Network drivers can now enable hardware support for receive flow steering via the new ndo_rx_flow_steer() method.
· kmem_cache_name(), which returned the name of a slab cache, has been removed from the kernel.
· The SLUB memory allocator now has a lockless fast path for allocations, speeding performance considerably. "Sadly this does nothing for the slowpath which is where the main issues with performance in slub are but the best case performance rises significantly."
· Kernel threads can be created on a specific NUMA node with the new kthread_create_on_node() function.
· The new function delete_from_page_cache() does what its name implies; unlike remove_from_page_cache() (which has now been deleted), it also decrements the page's reference count. It thus more closely mirrors add_to_page_cache().
· The new "hwspinlock" framework allows the implementation of synchronization primitives on systems where different cores are running different operating systems. See Documentation/hwspinlock.txt for more information.
3
统一控制printk信息输出与否
The dynamic debugging interface was added as a way of providing a uniform control interface for debugging output while avoiding cluttering the kernel with various hand-rolled alternatives.
Dynamic debug operates on print statements written with either of:
pr_debug(char *format, ...);
dev_dbg(struct device *dev, char *format, ...);
If the CONFIG_DYNAMIC_DEBUG option is not set, the above functions will be turned into normal printk() statements at the KERN_DEBUG level. If the option is enabled, though, the code sets aside a special descriptor for every call site, noting the module, function, and file names, along with the line number and format string. At system boot, all of these debug statements are turned off, so their output will not appear even if debug-level kernel messages are routed somewhere useful by the syslog daemon.
Turning on dynamic debug causes a new virtual file to appear at /sys/kernel/debug/dynamic_debug/control. Writing to that file will enable or disable specific debugging functions,.
4
The "pstore" filesystem provides access to platform-specific persistent storage which can be used to carry information across reboots.
"a generic layer for persistent storage usable to pass tens or hundreds of kilobytes of data from the dying breath of a crashing kernel to its successor".
There are other persistent storage methods for kernel log messages, notably devices/mtd/mtdoops.c and devices/char/ramoops.c. But those are targeted at the embedded space where NVRAM devices are prevalent or for platforms where RAM can be reserved that will not be cleared on a restart. Pstore is more flexible, as it can store more than just kernel logs, while the two *oops devices are wired into storing the output of kmsg_dump.
1 Schultz:
Diving into the Linux Networking Stack, Part I
2 2.6.39 merge window
part 1
int name_to_handle_at(int dfd, const char
*name, struct file_handle *handle,
int *mnt_id, int flag);
int open_by_handle_at(int dirfd, struct
file_handle *handle, int flags);
int clock_adjtime(clock_id which_clock,
struct timex *time);
Changes visible to kernel developers include:
3 Uprobes: 11th time is the charm?
有希望在将来merge into mainline。
The purpose of the uprobes subsystem: to enable the
placement of probes into user-space executable process memory. These probes
might be used to support a debugger like gdb or to support user-space tracing.
实现内幕:
The ptrace() interface is tied to processes; uprobes,
instead, works with files. A probe is placed at a certain offset within a
specific file; it will then trigger for every process which executes through
the probe's location. If the code placing the probe is only interested in
specific processes, it will need to filter the events itself. The interface may
seem a little strange - users will probably almost always be interested in
specific processes - but there are some advantages to doing things this way.
Underneath the hood, uprobes works by faulting in the
page which will contain the probe. The instruction at the probe location is
copied aside and replace by a breakpoint. Every process which has that file
mapped then gets a pointer in its mm structure pointing to the data describing
the probe(s) for that file. When a process executes the breakpoint, the probe's
handler function will be called; on that handler's return, the kernel will
single-step the displaced instruction, then return to the location following
the probe.
4 APIs for sensors
目前的问题:
new devices are added with inconsistent interfaces,
making life hard for application developers.
已 有的:Video4Linux2
handles cameras and the hwmon subsystem deals with the specific class of
sensors aimed at monitoring the health of the computer itself.
候选对象IIO还在staging tree中,有很长一段路,而驱动开发人员又等不及统一的接口。
industrial I/O (IIO) subsystem, which is meant "for
devices that in some sense are analog to digital converters." IIO tries to
handle a wide variety of sensors in some sort of standard way with support for
events, higher bandwidth I/O, and more.
1 Removed directories and st_nlink
A到一个已经存在的目录B上,目录B相当于被删除了。
mkdir("foo", 0777);
mkdir("bar", 0777);
fd1 = open("foo",
O_DIRECTORY);
fd2 = open("bar",
O_DIRECTORY);
rename("foo",
"bar"); /* kill old bar */
rmdir("bar");
/* kill old foo */
fstat(fd1, &buf1);
fstat(fd2, &buf2);
正常情况buf1.st_nlink and buf2.st_nlink都应该为0,但是许多文件系统没有做到。
2 Protecting /proc/slabinfo
一个 changed the permissions of /proc/slabinfo to 0400 引发的讨论。 "nearly all recent public exploits for heap issues rely on feedback
from /proc/slabinfo to manipulate heap layout into an exploitable state",讨论结果认为不能从根本上解决问题。
Mackall 认为本质问题是too easy for programmers to copy the wrong amount of data
from user space (which is how most of these object overruns occur).应该检查copy_from_user() interface .
3 Improving ptrace()
Tejun Heo posted for the improvement of ptrace。
问题背景:
interaction between tracing and job control. In an
untraced process, job control is used by the kernel and the shell to stop and
restart processes, possibly moving them between the foreground and the
background.
加上trace后的问题和处理方法:
4 Delaying the OOM killer
OOM killer 以control groups为单位引起的问题。
it is possible for user space to take over OOM-killer
duties in the control group context. Each group has a control file called
oom_control which can be used in a couple of interesting ways: