内核中的互斥之我见-config2010-ChinaUnix博客

config life

首页　| 　博文目录　| 　关于我

config2010

博客访问： 160526
博文数量： 28
博客积分： 2010
博客等级：大尉
技术积分： 239
用户组：普通用户
注册时间： 2009-11-07 01:15

文章分类

全部博文（28）

文章存档

2010年（24）

2009年（4）

我的朋友

相关博文

内核中的互斥之我见

分类：嵌入式

2010-01-07 08:14:06

/*e4gle:在我修改linux源代码的过程中曾被大量的内核互斥现象所困扰，这需要利用内核锁去解决，虽然最后大部分解决，但我觉得应该留下些什么，也没时间写了，偶尔看见这位兄弟的文章，觉得正是我想整理的，所以拿出来给大家分享，关于bottom_half和中断的问题，在tcp/ip半底中绝对不能对文件读写操作，不然就panic，恰恰我在linux中的增强功能就有这个操作，使我郁闷了很久，欢迎大家讨论
*/

内核中的互斥之我见
by  wheelz

看了前面各位的讨论，我也有些想法，与大家商榷。

需要澄清的是，互斥手段的选择，不是根据临界区的大小，而是根据临界区的性质，以及有哪些部分的代码，即哪些内核执行路径来争夺。

从严格意义上说，semaphore和spinlock_XXX属于不同层次的互斥手段，前者的实现有赖于后者，这有点象HTTP和TCP的关系，都是协议，但层次是不同的。

先说semaphore，它是进程级的，用于多个进程之间对资源的互斥，虽然也是在内核中，但是该内核执行路径是以进程的身份，代表进程来争夺资源的。如果竞争不上，会有context switch，进程可以去sleep，但CPU不会停，会接着运行其他的执行路径。从概念上说，这和单CPU或多CPU没有直接的关系，只是在 semaphore本身的实现上，为了保证semaphore结构存取的原子性，在多CPU中需要spinlock来互斥。

在内核中，更多的是要保持内核各个执行路径之间的数据访问互斥，这是最基本的互斥问题，即保持数据修改的原子性。semaphore的实现，也要依赖这个。在单CPU中，主要是中断和bottom_half的问题，因此，开关中断就可以了。在多CPU中，又加上了其他CPU的干扰，因此需要spinlock来帮助。这两个部分结合起来，就形成了spinlock_XXX。它的特点是，一旦CPU进入了spinlock_XXX，它就不会干别的，而是一直空转，直到锁定成功为止。因此，这就决定了被spinlock_XXX锁住的临界区不能停，更不能context switch，要存取完数据后赶快出来，以便其他的在空转的执行路径能够获得spinlock。这也是spinlock的原则所在。如果当前执行路径一定要进行context switch，那就要在schedule()之前释放spinlock，否则，容易死锁。因为在中断和bh中，没有context，无法进行context switch，只能空转等待spinlock，你context switch走了，谁知道猴年马月才能回来。

因为spinlock的原意和目的就是保证数据修改的原子性，因此也没有理由在spinlock
锁住的临界区中停留。

spinlock_XXX有很多形式，有

  spin_lock()/spin_unlock()，
  spin_lock_irq()/spin_unlock_irq()，
  spin_lock_irqsave/spin_unlock_irqrestore()
  spin_lock_bh()/spin_unlock_bh()

  local_irq_disable/local_irq_enable
  local_bh_disable/local_bh_enable

那么，在什么情况下具体用哪个呢？这要看是在什么内核执行路径中，以及要与哪些内核
执行路径相互斥。我们知道，内核中的执行路径主要有：

1  用户进程的内核态，此时有进程context，主要是代表进程在执行系统调用
等。
2  中断或者异常或者自陷等，从概念上说，此时没有进程context，不能进行
context switch。
3  bottom_half，从概念上说，此时也没有进程context。
4  同时，相同的执行路径还可能在其他的CPU上运行。

这样，考虑这四个方面的因素，通过判断我们要互斥的数据会被这四个因素中
的哪几个来存取，就可以决定具体使用哪种形式的spinlock。如果只要和其他CPU
互斥，就要用spin_lock/spin_unlock，如果要和irq及其他CPU互斥，就要用
spin_lock_irq/spin_unlock_irq，如果既要和irq及其他CPU互斥，又要保存
EFLAG的状态，就要用spin_lock_irqsave/spin_unlock_irqrestore，如果
要和bh及其他CPU互斥，就要用spin_lock_bh/spin_unlock_bh，如果不需要和
其他CPU互斥，只要和irq互斥，则用local_irq_disable/local_irq_enable，
如果不需要和其他CPU互斥，只要和bh互斥，则用local_bh_disable/local_bh_enable，
等等。值得指出的是，对同一个数据的互斥，在不同的内核执行路径中，
所用的形式有可能不同(见下面的例子)。

举一个例子。在中断部分中有一个irq_desc_t类型的结构数组变量irq_desc[]，
该数组每个成员对应一个irq的描述结构，里面有该irq的响应函数等。
在irq_desc_t结构中有一个spinlock，用来保证存取(修改)的互斥。

对于具体一个irq成员，irq_desc[irq]，对其存取的内核执行路径有两个，一是
在设置该irq的响应函数时(setup_irq)，这通常发生在module的初始化阶段，或
系统的初始化阶段；二是在中断响应函数中(do_IRQ)。代码如下：

int setup_irq(unsigned int irq, struct irqaction * new)
{
      int shared = 0;
      unsigned long flags;
      struct irqaction *old, **p;
      irq_desc_t *desc = irq_desc + irq;

      /*
      * Some drivers like serial.c use request_irq() heavily,
      * so we have to be careful not to interfere with a
      * running system.
      */
      if (new->;flags & SA_SAMPLE_RANDOM) {
            /*
               * This function might sleep, we want to call it first,
               * outside of the atomic block.
               * Yes, this might clear the entropy pool if the wrong
               * driver is attempted to be loaded, without actually
               * installing a new handler, but is this really a problem,
               * only the sysadmin is able to do this.
               */
            rand_initialize_irq(irq);
      }

      /*
      * The following block of code has to be executed atomically
      */
[1]    spin_lock_irqsave(&desc->;lock,flags);
      p = &desc->;action;
      if ((old = *p) != NULL) {
            /* Can't share interrupts unless both agree to */
            if (!(old->;flags & new->;flags & SA_SHIRQ)) {
[2]                   spin_unlock_irqrestore(&desc->;lock,flags);
                     return -EBUSY;
            }

            /* add new interrupt at end of irq queue */
            do {
                     p = &old->;next;
                     old = *p;
            } while (old);
            shared = 1;
      }

      *p = new;

      if (!shared) {
            desc->;depth = 0;
            desc->;status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING);
            desc->;handler->;startup(irq);
      }
[3]    spin_unlock_irqrestore(&desc->;lock,flags);

      register_irq_proc(irq);
      return 0;
}

asmlinkage unsigned int do_IRQ(struct pt_regs regs)
{
      /*
      * We ack quickly, we don't want the irq controller
      * thinking we're snobs just because some other CPU has
      * disabled global interrupts (we have already done the
      * INT_ACK cycles, it's too late to try to pretend to the
      * controller that we aren't taking the interrupt).
      *
      * 0 return value means that this irq is already being
      * handled by some other CPU. (or is disabled)
      */
      int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code  */
      int cpu = smp_processor_id();
      irq_desc_t *desc = irq_desc + irq;
      struct irqaction * action;
      unsigned int status;

      kstat.irqs[cpu][irq]++;
[4]    spin_lock(&desc->;lock);
      desc->;handler->;ack(irq);
      /*
         REPLAY is when Linux resends an IRQ that was dropped earlier
         WAITING is used by probe to mark irqs that are being tested
         */
      status = desc->;status & ~(IRQ_REPLAY | IRQ_WAITING);
      status |= IRQ_PENDING; /* we _want_ to handle it */

      /*
      * If the IRQ is disabled for whatever reason, we cannot
      * use the action we have.
      */
      action = NULL;
      if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
            action = desc->;action;
            status &= ~IRQ_PENDING; /* we commit to handling */
            status |= IRQ_INPROGRESS; /* we are handling it */
      }
      desc->;status = status;

      /*
      * If there is no IRQ handler or it was disabled, exit early.
         Since we set PENDING, if another processor is handling
         a different instance of this same irq, the other processor
         will take care of it.
      */
      if (!action)
            goto out;

      /*
      * Edge triggered interrupts need to remember
      * pending events.
      * This applies to any hw interrupts that allow a second
      * instance of the same irq to arrive while we are in do_IRQ
      * or in the handler. But the code here only handles the _second_
      * instance of the irq, not the third or fourth. So it is mostly
      * useful for irq hardware that does not mask cleanly in an
      * SMP environment.
      */
      for (;;) {
[5]          spin_unlock(&desc->;lock);
            handle_IRQ_event(irq, ®s, action);
[6]          spin_lock(&desc->;lock);

            if (!(desc->;status & IRQ_PENDING))
                     break;
            desc->;status &= ~IRQ_PENDING;
      }
      desc->;status &= ~IRQ_INPROGRESS;
out:
      /*
      * The ->;end() handler has to deal with interrupts which got
      * disabled while the handler was running.
      */
      desc->;handler->;end(irq);
[7]    spin_unlock(&desc->;lock);

      if (softirq_pending(cpu))
            do_softirq();
      return 1;
}

在setup_irq()中，因为其他CPU可能同时在运行setup_irq()，或者在运行setup_irq()时，
本地irq中断来了，要执行do_IRQ()以修改desc->;status。为了同时防止来自其他CPU和
本地irq中断的干扰，如[1][2][3]处所示，使用了spin_lock_irqsave/spin_unlock_irqrestore()

而在do_IRQ()中，因为do_IRQ()本身是在中断中，而且此时还没有开中断，本CPU中没有
什么可以中断其运行，其他CPU则有可能在运行setup_irq()，或者也在中断中，但这二者
对本地do_IRQ()的影响没有区别，都是来自其他CPU的干扰，因此只需要用spin_lock/spin_unlock，
如[4][5][6][7]处所示。值得注意的是[5]处，先释放该spinlock，再调用具体的响应函数。

再举个例子：

static void tasklet_hi_action(struct softirq_action *a)
{
      int cpu = smp_processor_id();
      struct tasklet_struct *list;

[8]    local_irq_disable();
      list = tasklet_hi_vec[cpu].list;
      tasklet_hi_vec[cpu].list = NULL;
[9]    local_irq_enable();

      while (list) {
            struct tasklet_struct *t = list;

            list = list->;next;

            if (tasklet_trylock(t)) {
                     if (!atomic_read(&t->;count)) {
                              if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->;state))
                                    BUG();
                              t->;func(t->;data);
                              tasklet_unlock(t);
                              continue;
                     }
                     tasklet_unlock(t);
            }

[10]          local_irq_disable();
            t->;next = tasklet_hi_vec[cpu].list;
            tasklet_hi_vec[cpu].list = t;
            __cpu_raise_softirq(cpu, HI_SOFTIRQ);
[11]          local_irq_enable();
      }
}

这里，对tasklet_hi_vec[cpu]的修改，不存在CPU之间的竞争，因为每个CPU有各自独立的数据，
所以只要防止irq的干扰，用local_irq_disable/local_irq_enable即可，如[8][9][10][11]处
所示。

阅读(1710) | 评论(0) | 转发(0) |

上一篇：内核同步方法之自旋锁

下一篇：Linux 线程同步---条件变量

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6