理解Linux内核(2)----内核同步机制简介-chg.s-ChinaUnix博客

老虎的试验田lewis.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

chg.s

博客访问： 340084
博文数量： 79
博客积分： 2466
博客等级：大尉
技术积分： 880
用户组：普通用户
注册时间： 2006-02-07 16:47

文章分类

全部博文（79）

算法（3）
信口开河（19）
读书笔记（55）
宝贝筐子（2）
未分配的博文（0）

文章存档

2014年（3）

2012年（7）

2011年（14）

2010年（2）

2009年（2）

2008年（2）

2007年（18）

2006年（31）

我的朋友

相关博文

理解Linux内核(2)----内核同步机制简介

分类： LINUX

2006-05-04 10:29:40

1.6.5.1. Kernel preemption disabling

禁止内核抢夺

To provide a drastically simple solution to synchronization problems, some traditional Unix kernels are nonpreemptive: when a process executes in Kernel Mode, it cannot be arbitrarily suspended and substituted with another process. Therefore, on a uniprocessor system, all kernel data structures that are not updated by interrupts or exception handlers are safe for the kernel to access.

为了给同步问题提供一个有效的解决方案，一些传统的UNIX 系统内核是非抢夺式的，也就是说当一个进程在kernel mode下执行的时候，不允许它被随意挂起或者被其它进程替代。这样，在一个单CPU的系统上，所有的内核数据结构，只要不被中断/异常处理更改，就都是保证安全的。

Of course, a process in Kernel Mode can voluntarily relinquish the CPU, but in this case, it must ensure that all data structures are left in a consistent state. Moreover, when it resumes its execution, it must recheck the value of any previously accessed data structures that could be changed.

当然，允许一个处于Kernel mode的进程主动地放弃对CPU的占有。但在这种情况下，它必须保证，所有的数据结构都都处于一致的状态。此外，当它恢复执行的时候，它必须重新检查那些它曾经访问过、并且有可能被其它进程更改的数据结构是否已经发生了变化。

A synchronization mechanism applicable to preemptive kernels consists of disabling kernel preemption before entering a critical region and reenabling it right after leaving the region.

对抢夺式的内核来说，一个可行的同步机制是，在进入critical region之前禁止内核抢夺，并且在离开critical region之后重新允许内核抢夺。

Nonpreemptability is not enough for multiprocessor systems, because two kernel control paths running on different CPUs can concurrently access the same data structure.

不可抢夺的内核设计在多处理器环境下是不能保证正确的，因为多个kernel control path可能在不同的CPU上同时运行，并且访问同一个数据结构。

1.6.5.2. Interrupt disabling

禁止中断

Another synchronization mechanism for uniprocessor systems consists of disabling all hardware interrupts before entering a critical region and reenabling them right after leaving it. This mechanism, while simple, is far from optimal. If the critical region is large, interrupts can remain disabled for a relatively long time, potentially causing all hardware activities to freeze.

另外一个针对单处理器系统的同步机制是，在进入一个 critical region之前禁止全部的硬件中断，并且在离开这个critical region之后重新打开（允许）硬件中断。这一机制虽然简单，但是效率不高。如果critical region比较大，中断可能被禁止较长的时间，这可能会使硬件被冻结，不能作出响应。

Moreover, on a multiprocessor system, disabling interrupts on the local CPU is not sufficient, and other synchronization techniques must be used.

此外，在多处理器系统上，在当前CPU上禁止中断是不能正确完成同步功能的，必须同时使用附加的同步技术。

1.6.5.3. Semaphores

信号量

A widely used mechanism, effective in both uniprocessor and multiprocessor systems, relies on the use of semaphores . A semaphore is simply a counter associated with a data structure; it is checked by all kernel threads before they try to access the data structure. Each semaphore may be viewed as an object composed of:

信号量作为一种同步机制，在单处理器和多处理器系统里都是有效的，因而得到广泛使用。一个信号量，实际上是一个和数据结构相关联的计数器。任何内核线程试图访问这个数据结构之前都必须先检查这个计数器。每个信号量都可以被看作是由以下组件构成的一个对象：

An integer variable

一个整型变量

A list of waiting processes

一个等待访问某数据结构的进程列表

Two atomic methods: down( ) and up( )

两个原子操作：down()和up()

The down( ) method decreases the value of the semaphore. If the new value is less than 0, the method adds the running process to the semaphore list and then blocks (i.e., invokes the scheduler). The up( ) method increases the value of the semaphore and, if its new value is greater than or equal to 0, reactivates one or more processes in the semaphore list.

Down()方法让信号量的值减少1。如果信号量中变量的新值小于0，这个方法就会把检查这个信号量的进程加到这个信号量的进程列表上，然后阻塞（即调用系统调度器去执行其它进程）。Up()方法让信号量的值增加 1，如果信号量的新值大于等于0，就从信号量的进程列表中取出一个进程并解除对它的阻塞。

Each data structure to be protected has its own semaphore, which is initialized to 1. When a kernel control path wishes to access the data structure, it executes the down( ) method on the proper semaphore. If the value of the new semaphore isn't negative, access to the data structure is granted. Otherwise, the process that is executing the kernel control path is added to the semaphore list and blocked. When another process executes the up( ) method on that semaphore, one of the processes in the semaphore list is allowed to proceed.

每一个需要被保护的数据结构都有它自己的信号量，这个信号量被初始化为1。当一个kernel control path需要访问这个数据结构的时候，它在这个信号量上执行down()操作。如果这个信号量的新值是非负的，对该数据结构的访问就被允许。否则，执行 down()操作的进程就被加到这个信号量的进程列表里，然后被阻塞。当另外一个进程执行了这个信号量的up()操作，那个信号量的进程列表里，就会有一个进程被解除阻塞。

1.6.5.4. Spin locks

回旋锁

In multiprocessor systems, semaphores are not always the best solution to the synchronization problems. Some kernel data structures should be protected from being concurrently accessed by kernel control paths that run on different CPUs. In this case, if the time required to update the data structure is short, a semaphore could be very inefficient. To check a semaphore, the kernel must insert a process in the semaphore list and then suspend it. Because both operations are relatively expensive, in the time it takes to complete them, the other kernel control path could have already released the semaphore.

在多处理器系统中，信号量并不是同步问题的最佳解决方案。有一些内核数据结构必须被严格保护，使他不被运行在不同CPU上的Kernel control path同时访问。这种情况下，如果对该数据结构的访问所需的时间很短，那么信号量也许就不是一种有效的同步机制。为了等待访问一个资源的权限，内核需要把一个进程加到对应信号量的进程列表里，并将其阻塞——这两个操作和访问数据结构这一操作相比，相对代价较高，也许这两个操作还没做完，信号量关联的数据结构已经被正占用它的kernel control path释放了。

In these cases, multiprocessor operating systems use spin locks . A spin lock is very similar to a semaphore, but it has no process list; when a process finds the lock closed by another process, it "spins" around repeatedly, executing a tight instruction loop until the lock becomes open.

这种情况下，针对多处理器的操作系统一般选用回旋锁来进行同步。一个回旋锁和一个信号量很像，但是它没有对应的进程列表。当一个进程发现某个回旋锁被另外一个进程锁住的时候，该进程只是反复的回旋，执行一个密集指令循环来等待该锁被释放。

Of course, spin locks are useless in a uniprocessor environment. When a kernel control path tries to access a locked data structure, it starts an endless loop. Therefore, the kernel control path that is updating the protected data structure would not have a chance to continue the execution and release the spin lock. The final result would be that the system hangs.

当然，回旋锁在单处理器系统中是没有用的。当一个 kernel control path试图访问一个被加锁的数据结构的时候，就会陷入无限循环，这样，锁住该数据结构的kernel control path就得不到CPU去完成对该数据结构的更新，这样这个锁就永远不会被打开。结果是死锁，系统挂起。

1.6.5.5. Avoiding deadlocks

避免死锁

Processes or kernel control paths that synchronize with other control paths may easily enter a deadlock state. The simplest case of deadlock occurs when process p1 gains access to data structure a and process p2 gains access to b, but p1 then waits for b and p2 waits for a. Other more complex cyclic waits among groups of processes also may occur. Of course, a deadlock condition causes a complete freeze of the affected processes or kernel control paths.

需要和其它进程或者kernel control path同步的进程和kernel control path经常陷入死锁状态。最简单的死锁状态发生在以下情况下：进程P1取得访问数据a的权利，进程P2取得访问数据b的权利，但是此时P1等待对b的访问权限并且P2等待对a的访问权限。更复杂的、存在于多个进程之间的循环等待造成的死锁也可能发生。一个死锁状态会造成相关的全部进程和kernel control path被冻结。

As far as kernel design is concerned, deadlocks become an issue when the number of kernel locks used is high. In this case, it may be quite difficult to ensure that no deadlock state will ever be reached for all possible ways to interleave kernel control paths. Several operating systems, including Linux, avoid this problem by requesting locks in a predefined order.

就内核设计而言，在内核锁被大量使用的时候，死锁就成了一个问题。这种情况下，要保证各kernel control path在以可能的任何顺序被交织时没有死锁发生，是非常困难的。很多操作系统，包括linux，预先定义了申请加锁必须依照的顺序，来避免死锁。

阅读(1110) | 评论(0) | 转发(0) |

上一篇：理解linux内核(1)----关于kernel和reentrancy

下一篇：理解linux内核(3): linux的信号，进程间通信，进程管理概览

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6