分类: LINUX
2011-11-04 11:23:03
1
Matthew Garrett into the subtleties of booting Linux with EFI.
2
User-visible changes merged for 3.1 include:
Changes visible to kernel developers include:
3
讨论rt-tree对Per-cpu变量的处理
背景:
Safe access to per-CPU data requires a couple of constraints, though: the thread working with the data cannot be preempted and it cannot be migrated while it manipulates per-CPU variables. To avoid these hazards, access to per-CPU variables is normally bracketed with calls to get_cpu_var() and put_cpu_var(); the get_cpu_var() call, along with providing the address for the processor's version of the variable, disables preemption.
目前rt-tree的做法: In the past, this problem has been worked around by protecting per-CPU variables with spinlocks. These locks keep the code preemptable, but they wreck the scalability that per-CPU variables were created to provide and complicate the code.
将来的做法: whenever a process acquires a spinlock or obtains a CPU reference with get_cpu(), the scheduler will refrain from migrating that process to any other CPU. That process remains preemptable - code holding spinlocks can be preempted in the realtime world - but it will not be moved to another processor. 这样做法的前提是假定per-cpu已经有per-cpu lock保护,这样的话需要的改动比以前的小, 而且scalability也好,但前景尚不可知.
4 (重点,未完全消化)
首先Overview of preemptible RCU read-side code, 然后列举了一些bugs和commits
in_irq() can return inaccurate results because it consults the preempt_count() bitmask, which is updated in software. At the start of the interrupt, there is therefore a period of time before preempt_count() is updated to record the start of the interrupt, during which time the interrupt handler has started executing, but in_irq() returns false. Similarly, at the end of the interrupt, there is a period of time after preempt_count() is updated to record the end of the interrupt, during which time the interrupt handler has not completed executing, but again in_irq() returns false. This last is most emphatically the case when the end-of-interrupt processing kicks off softirq handling.
上面的这段话事件用区间[real_irq_begin, in_irq_begin, in_irq_end, real_irq_end]来表示, real_irq 和in_irq有一个偏差,导致in_irq()在中断开始时和中断结束时的判断都是错误的.很多RCU的bug都与此有关.
1
硬件厂商依然往往只考虑Windows
Matthew Garrett the subtleties of booting Linux with EFI. Once again, hardware vendors are myopically focusing on Windows. "As we've seen many times in the past, the only thing many hardware vendors do is check that Windows boots correctly.”
2
本来NAT是为了解决IPV4地址不足出现的, 但是另外的需要“People want to hide the details of the topology of their internal networks, therefore we will have NAT with ipv6 no matter what we think or feel.”导致NAT在IPV6中继续存在
3
中的bug会导致文件丢失, Linus, Al, and Hugh三人合力才解决.
“Our once approachable and hackable kernel has, over time, become more complex and difficult to understand.”
4
为了改进用户态程序的低级错误(不检查setuid的返回值就认为成功), 主动改进内核的防御.
That led to the , which changed do_execve_common() to return an error (EAGAIN) if the user was over their process limit and removed the check from set_user().setuser是在setuid中调用
5
以前的努力主要在内核,too invasive to be merged. by Pavel Emelyanov的大部分实现在用户态, 前景不明.
1
Linus发布的内核将命名为3.0, 而不是3.0.0. stable kernel继续x.y.z风格
2
常例文章, 下面的趋势要注意:
The percentage of changes from hobbyists continues to drop; whether that's a bad thing (the kernel is becoming increasingly unapproachable to volunteer developers) or a good thing (it's impossible for anybody who can hack the kernel to remain unemployed) is still not clear.
另外,做了两个基于长期数据的统计
The history from the beginning of the 2.5 development series covers about 9.5 years of development. During this time, some 291,664 changesets were contributed by 8,078 developers; those changes added 10.5 million lines of code.
Since 2.6.0, there have been 264,706 changesets contributed by 7,725 developers adding 8.7 million lines of code.
One other exercise with this data seemed interesting: a determination of who have been the most consistent contributors over those nine years and some. After running a script to track which developers contributed to each major release, twelve developers were found who had contributed to all 41 of them.
3
未解决的老问题, Jonathan Corbet提出利用udev来格式化输出数据的思路
1
The poll(), select(), and epoll_wait() system calls are all implemented with
the poll() method in the file_operations structure:
unsigned int (*poll) (struct file *filp,
struct poll_table_struct *pt); poll函数返回值表示是否阻塞,如果可能,将加入等待队列到pt。有一个优化措施,如果某个文件poll操作不阻塞的话,余下文件的poll操作pt参数将是NULL.
问题:如果是device
file, driver需要知道对它进行的操作以便尽早启动硬件。
解决方案:Hans
Verkuil has posted slightly changing the way poll() works.保证driver能够查询pt结构。 With the patch, the poll table is never passed as null; instead,
the "we will not be blocking" case is marked internally. So the set
of events requested by the application is always available;
2
如何expand the
functionality of seccomp依然没有达成一致意见。
3
碰到的老问题, The current CMA mechanism is
used as an allocator behind dma_alloc_coherent(), 但是该函数在ARM平台存在mutilpe mapping问题,从而cache attribute 不一致,导致系统行为undefined, 见.
目前有两个解决方案,使用high
memory(arm上不普遍并且arm实现有特殊的困难)或者unmap low memory(代价是huge page被分成小页面)。
4
问题背景:一个driver实际有多个硬件组成,它们之间的初始化存在依赖关系。
Grant's takes a simple approach to solving this problem:
drivers which are unable to initialize their devices as the result of missing
resources can request that the operation be retried at some point in the
future. That request is a simple matter of returning -EAGAIN from the probe()
function.