读书笔记—— Under Standing Linux Network Internals(8)-tuohuang0303-ChinaUnix博客

flyinlinux

首页　| 　博文目录　| 　关于我

tuohuang0303

博客访问： 185749
博文数量： 57
博客积分： 2215
博客等级：大尉
技术积分： 635
用户组：普通用户
注册时间： 2010-11-09 15:47

个人简介

非淡泊无以明志，非宁静无以致远

文章分类

全部博文（57）

内核（12）

Android（0）
Linux网络协议栈（0）
C/C++（1）
笑话锦集（3）
人生哲理（0）
Linux内核进阶（13）

Linux中断（2）
Linux小程序（10）

shell编程（6）
Linux起步（17）
未分配的博文（1）

文章存档

2013年（12）

2011年（15）

2010年（30）

我的朋友

9.1 Decisions and Traffic Direction

描述了数据包的传输方向和协议栈的关系。分为三类：

传入的包：
从L1一直到上层的应用层，一级一级传递。
发出的包：
与上述的相反，中最上层的应用层，一级一级向下传递，直到L1。
转发的包：
L1层接收到了数据包，需要在IP层（L3）判断是否为转发包，如果是，则不再向上层传递，经过处理后转而发送至L1。

A black cat stalking a spider

9.2 Notifying Drivers When Frames Are Received

9.2.1 轮询方式

一直傻等。

9.2.2 中断方式

如果有人为邮筒做了个装置，这个装置可以在邮筒中有信件的时候通知邮递员，那么就成了类似中断了。这样在多数情况下可以较好的满足性能上的需求，但是当网卡附负载太重的时候，会给CPU带来较多的负载：CPU在处理中断上会花费过多的时间。还拿上面的例子，在邮件太多的时候，如果采用中断方式，那么邮递员就有的忙了。

采用中断处理时，对于输入的frame的处理可以分成两个部分：

驱动：
驱动将数据拷贝到内核能够访问的输入列队里。这部分工作在中断上下文中，
内核：
处理列队中的数据，并将其向协议栈的上一层传递。

9.2.3 单个中断中处理多帧数据

在内核接收到中断以后，可以暂时关闭其他的中断源，转而一直输入列队中读取数据。这样就可以在单个中断中处理多帧数据了。但是很少有驱动这样去做。

NAPI，是这中技术的一个改进。NAPI中，当内核受到了网卡的中断信号以后，暂时关闭这个网卡上的中断，并在一定时间内将对该网卡的处理由中断改为轮询。这样可以在负载较高的负载环境中包书较好的效率。

9.2.4 Timer-Driven Interrupts

见过路边的邮筒吧？上面写了每天的几点会打开（有邮递员来将信取走）。或者说，每隔多长时间邮递员过来查一次。

9.2.5 Combinations

先用中断方式，等中断被触发后，改用timer-driven。

9.3 中断处理函数

9.3.1 Bottom Half Handlers

中断的handler 通常用数字来标识。

中断函数在运行起见，内核代码运行在中断上下文中，在该上下文中，由于CPU在处理中断，所有的中断源被暂时关闭。也就是说，如果CPU在进行中断处理的时候，它不能接收其他的中断请求，同时他也不能去执行其他的进程：CPU已经完全被中断处理函数占用，而且不能被抢占。总之：中断处理函数不可抢占，不可重入。

这种设计可以避免竞争冒险，但是这种不可抢占的方式，无疑降低内核的整体性能。

因此，内核的中断处理应该越短越好。结合实际，硬件上的存储空间有限，且一旦丢失则无法挽回。而用户空间中的进程，则通常可以适当的延迟一下，秋后问斩。因此，现代的中断处理被分成了两个部分：top half & bottom half。其中的top half必须在独占CPU的时候去做，在释放CPU之前完成，主要负责处理硬件相关，例如拷贝数据至列队。而bottom half，则可以在释放CPU之后，在后面CPU相对“清闲”的时候去完成。从这个角度来讲，可以将bottom half看作是一个异步的调用请求：调用bottom half中的某个函数去完成预定的操作。

使用这个方法，我们可以重新定义一下中断处理的模型：

1. 设备向CPU发出信号，作为中断通知；
2. 关闭中断，执行top half，主要包括：
- 在内存中记录所有内核处理该中断所需的信息。
- 通过某种方式作出标记，以确保内核会在稍后用前面保存的信息来处理这个中断。
- 重新打开前面关闭的中断。
3. CPU空闲的时候，检测前面设置的标记，处理数据（bottom half），并将前面设置的标记清零。

9.3.2 Bottom Halves Solutions

Bottom half有多种构建方式，其区别主要在于运行的上下文环境以及并发处理和锁处理。

9.3.3 并发和加锁

关于并发的总结：

旧式的bottom half，无论机器有几个CPU，在同一时刻只能运行一个。
同一时刻，每一个taskset仅能有一个实例在运行；不同的taskset可以在不同的CPU上并发执行。

同一时刻，每一个softirq仅能有一个实例在运行；不同的softirq可以在不同的CPU上并发执行。

Table A few APIs related to software and hardware interrupts


Function/macro	Description
local_bh_disable	disables bottom halves on the local CPU.
local_irq_restore	restores the state of interrupts on the local CPU thanks to the information previously saved with local_irq_save.
__local_bh_enable
__raise_softirq_irqoff	Sets the flag associated with the input softirq type to mark it pending.
in_interrupt	returns TRUE if the CPU is currently serving a hardware or software interrupt, or preemption is disabled.
in_irq	returns TRUE if the CPU is currently serving a hardware interrupt.
in_softirq	returns TRUE if the CPU is currently serving a software interrupt.
local_bh_disable	__local_bh_enable enables bottom halves (and thus softirqs/tasklets) on the local CPU, and local_bh_enable also invokes invoke_softirq if any softirq is pending and in_interrupt( ) returns FALSE.
local_bh_enable
local_irq_disable
local_irq_enable	Disable and enable interrupts on the local CPU.
local_irq_restore	local_irq_save first saves the current state of interrupts on the local CPU and then disables them.
local_irq_save
local_softirq_pending	Returns TRUE if there is at least one softirq pending for the local CPU.
raise_softirq	This is a wrapper around raise_softirq_irqoff that disables hardware interrupts before calling it and restores them to their original status.
raise_softirq_irqoff	This is a wrapper around _ _raise_softirq_irqoff that also wakes up ksoftirqd when in_interrupt( ) returns FALSE.
softirq_pending	Returns TRUE if there is at least one softirq pending (i.e., scheduled for execution) for the CPU whose ID was passed as the input argument.
spin_lock_bh
spin_unlock_bh	Acquire and release a spinlock, respectively. Both functions disable and then re-enable bottom halves and preemption during the operation.

9.3.4 内核的抢占

内核抢占相关函数：

preempt_disable
Disables preemption for the current task. Can be called repeatedly, incrementing a reference counter.
preempt_enable
Enable 抢占
preempt_enable_no_resched
The reverse of preempt_disable, allowing preemption to be enabled again.preempt_enable_no_resched simply decrements a reference counter, whichallows preemption to be re-enabled when it reaches zero. preempt_enable,in addition, checks whether the counter is zero and forces a call toschedule( ) to allow any higher-priority task to run.
preempt_check_resched
This function is called by preempt_enable and differentiates it frompreempt_enable_no_resched.

9.3.5 Bottom-Half Handlers

bottom half 应该满足的几个步骤：
- 对bottom half 进行分类.
- 关联 Bottom half 与相应的 hanlder.
- 安排执行 Bottom half.
- 通知内核需要执行的这些 bottom halves.
Bottom-half handlers in kernel 2.2
介绍了2.2版本的内核的bottom-half handler相关的东西，过时了……

Bottom-half handlers in kernel 2.4 and above: the introduction of the softirq
从2.4开始，内核采用softirq来构建 bottom half。而在2.6.32版本的内核中，已经找不到原来的NET_BH这种2.2版本使用的定义了。新的Softirq定义在include/linux/interrupt.h中，如下：

/* PLEASE, avoid to allocate new softirqs, if you need not _really_ high   frequency threaded job scheduling. For almost all the purposes   tasklets are more than enough. F.e. all serial device BHs et   al. should be converted to tasklets, not to softirqs. */enum{    HI_SOFTIRQ=0,    TIMER_SOFTIRQ,    NET_TX_SOFTIRQ,    NET_RX_SOFTIRQ,    BLOCK_SOFTIRQ,    BLOCK_IOPOLL_SOFTIRQ,    TASKLET_SOFTIRQ,    SCHED_SOFTIRQ,    HRTIMER_SOFTIRQ,    RCU_SOFTIRQ,    /* Preferable RCU should always be the last softirq */    NR_SOFTIRQS};

其中涉及到网络的主要是NET_TX_SOFTIRQ和NET_RC_SOFTIRQ。

注册soft_irq handler
函数open_softirq用于注册handler,它接受两个参数，int nr 和函数指针action。其中，nr为softirq的值，action为handler。其作用是将参数拷贝至全局的softirq_vec中，而softirq_vec中保存了handler和softirq的对应关系。代码如下：

/* softirq mask and active fields moved to irq_cpustat_t in * asm/hardirq.h to get better cache usage. */struct softirq_action{    void    (*action)(struct softirq_action *);};static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;void open_softirq(int nr, void (*action)(struct softirq_action *)){    softirq_vec[nr].action = action;}

安排和执行 softirq
可以通过下列命令在local CPU上安排执行softirq：
- __raise_softirq_irqoff
  置位softirq相关的flag，以便调用handler。
- raise_softirq_irqoff
  这是__raise_softirq_irqoff的一个wrapper，除了前者的工作以外，当进程处于中断上下文中时，需要wakeup_softirqd，以却保softirq被安排。
- raise_softirq
  为raise_softirq_irqoff的一个wrapper，在关闭了硬件中断的前提下调用前者。

9.3.6 SoftIRQ Initialization

SoftIRQ 的出初始化由函数softirq_init完成，该函数在系统初始化函数start_kernel中被调用,代码如下：

void __init softirq_init(void){    int cpu;    for_each_possible_cpu(cpu) {        int i;        per_cpu(tasklet_vec, cpu).tail =            &per_cpu(tasklet_vec, cpu).head;        per_cpu(tasklet_hi_vec, cpu).tail =            &per_cpu(tasklet_hi_vec, cpu).head;        for (i = 0; i < NR_SOFTIRQS; i++)            INIT_LIST_HEAD(&per_cpu(softirq_work_list[i], cpu));    }    register_hotcpu_notifier(&remote_softirq_cpu_notifier);    open_softirq(TASKLET_SOFTIRQ, tasklet_action);    open_softirq(HI_SOFTIRQ, tasklet_hi_action);}

网络代码相关的 NET_RX_SOFTIRQ 和 NET_TX_SOFTIRQ的初始化则在net_dev_init中。

static int __init net_dev_init(void){    ...    open_softirq(NET_TX_SOFTIRQ, net_tx_action);    open_softirq(NET_RX_SOFTIRQ, net_rx_action);    ...}

9.3.7 Pending SoftIRQ的处理

函数do_softirq用于处理Pending的Softirq。该函数首先检查CPU是否正在相应中断（in_irq），如果是则不作处理，直接返回。如果CPU没有相应中断，则将pending irq保存起来（loacal_softirq_save），随后调用__do_softirq来处理pending的IRQ。这一过程如下：

asmlinkage void do_softirq(void){    __u32 pending;    unsigned long flags;    if (in_interrupt())        return;    local_irq_save(flags);    pending = local_softirq_pending();    if (pending)        __do_softirq();    local_irq_restore(flags);}

__do_softirq
该函数首先调用local_softirq_pending()来将目前local CPU上Pending的Softirq拷贝至变量pending，并随即调用set_softirq_pending(0)来清空pendingsoftirq的bitmask，然后开始真正的softirq的处理。

对前面拷贝的pending里面的每隔bit，如果置位则表明需要处理，此时调用存在sofirq_vec中的handler进行处理；循环这一过程，直到所有的handler都被处理完毕。

代码如下：

/* * We restart softirq processing MAX_SOFTIRQ_RESTART times, * and we fall back to softirqd after that. * * This number has been established via experimentation. * The two things to balance is latency against fairness - * we want to handle softirqs as soon as possible, but they * should not be able to lock up the box. */#define MAX_SOFTIRQ_RESTART 10asmlinkage void __do_softirq(void){    struct softirq_action *h;    __u32 pending;    int max_restart = MAX_SOFTIRQ_RESTART;    int cpu;    pending = local_softirq_pending();    account_system_vtime(current);    __local_bh_disable((unsigned long)__builtin_return_address(0));    lockdep_softirq_enter();    cpu = smp_processor_id();restart:    /* Reset the pending bitmask before enabling irqs */    set_softirq_pending(0);    local_irq_enable();    h = softirq_vec;    do {        if (pending & 1) {            int prev_count = preempt_count();            kstat_incr_softirqs_this_cpu(h - softirq_vec);            trace_softirq_entry(h, softirq_vec);            h->action(h);            trace_softirq_exit(h, softirq_vec);            if (unlikely(prev_count != preempt_count())) {                printk(KERN_ERR "huh, entered softirq %td %s %p"                       "with preempt_count %08x,"                       " exited with %08x?\n", h - softirq_vec,                       softirq_to_name[h - softirq_vec],                       h->action, prev_count, preempt_count());                preempt_count() = prev_count;            }            rcu_bh_qs(cpu);        }        h++;        pending >>= 1;    } while (pending);    local_irq_disable();    pending = local_softirq_pending();    if (pending && --max_restart)        goto restart;    if (pending)        wakeup_softirqd();    lockdep_softirq_exit();    account_system_vtime(current);    _local_bh_enable();}

9.3.8 Per-Architecture Processing of softirq

不同的构架下，do_softirq可能会被重写。

9.3.9 内核线程： ksoftirqd

内核在后台有名为ksoftirqd的线程，专门用于检查没有执行的softirqhandlers，并尽可能在CPU没有服务中断的时候去执行他们。每一个CPU都有这样的一个线程，其命名分别为：ksoftirqd_CPU0, ksoftirqd_CPU1等等。

函数ksoftirqd定义如下：

static int ksoftirqd(void * __bind_cpu){    set_current_state(TASK_INTERRUPTIBLE);    while (!kthread_should_stop()) {        preempt_disable();        if (!local_softirq_pending()) {            preempt_enable_no_resched();            schedule();            preempt_disable();        }        __set_current_state(TASK_RUNNING);        while (local_softirq_pending()) {            /* Preempt disable stops cpu going offline.               If already offline, we'll be on wrong CPU:               don't process */            if (cpu_is_offline((long)__bind_cpu))                goto wait_to_die;            do_softirq();            preempt_enable_no_resched();            cond_resched();            preempt_disable();            rcu_sched_qs((long)__bind_cpu);        }        preempt_enable();        set_current_state(TASK_INTERRUPTIBLE);    }    __set_current_state(TASK_RUNNING);    return 0;wait_to_die:    preempt_enable();    /* Wait for kthread_stop */    set_current_state(TASK_INTERRUPTIBLE);    while (!kthread_should_stop()) {        schedule();        set_current_state(TASK_INTERRUPTIBLE);    }    __set_current_state(TASK_RUNNING);    return 0;}

进程的优先级从-20到+19，其中-20为最高，+19为最低，线程ksoftirqd的优先级被设置成了最低（－19），以避免频繁的NET_RX_SOFTIRQ会给CPU带来太大的负担。

该线程一旦启动，就会不停的去调用do_softirq来处理Pending的handler。

9.3.10 How the Networking Code Uses softirqs

dev_net_init中初始化，代码见前面。

9.4 softnet_data数据结构

对输入的数据来讲，每一个CPU都有自己的列队。该列队用softnet_data来表示，代码如下：

/* * Incoming packets are placed on per-cpu queues so that * no locking is needed. */struct softnet_data{    struct Qdisc        *output_queue;    struct sk_buff_head input_pkt_queue;    struct list_head    poll_list;    struct sk_buff      *completion_queue;    struct napi_struct  backlog;};

9.4.1 成员介绍

output_queue
completion_queue
已经完成接收工作的sk_buff，这些数据可以被释放。
poll_list
等待处理的设备的双向链表。

9.4.2 softnet_data 的初始化

该数据结构的初始化在net_dev_init中，如下：

for_each_possible_cpu(i) {    struct softnet_data *queue;    queue = &per_cpu(softnet_data, i);    skb_queue_head_init(&queue->input_pkt_queue);    queue->completion_queue = NULL;    INIT_LIST_HEAD(&queue->poll_list);    queue->backlog.poll = process_backlog;    queue->backlog.weight = weight_p;    queue->backlog.gro_list = NULL;    queue->backlog.gro_count = 0;}

Author:yangyingchao, 2010-06-09

阅读(864) | 评论(1) | 转发(0) |

上一篇：Linux 内核软中断(softirq)执行分析

下一篇：TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE

给主人留下些什么吧！~~

chinaunix网友2011-03-05 16:32:31

很好的, 收藏了推荐一个博客，提供很多免费软件编程电子书下载： http://free-ebooks.appspot.com

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6

Table of Contents