无关风月

首页　| 　博文目录　| 　关于我

jurson

博客访问： 584812
博文数量： 117
博客积分： 10
博客等级：民兵
技术积分： 359
用户组：普通用户
注册时间： 2011-03-13 21:58

个人简介

爱上香烟

文章分类

全部博文（117）

算法（1）
vpn（1）
排错（2）
工作日志（4）
netfilter i（4）
编程（13）

socket套接字（2）

linux C（10）
shell（4）
应用工具（3）
go lang（1）
网络协议（14）

校验值计算（2）
linux（70）

内核与用户层交互（5）

openvpn（0）

内核操作函数（1）

虚拟化（6）

IP分片重组（3）

数据流程（18）

rps（6）
未分配的博文（0）

文章存档

2018年（3）

2017年（8）

2016年（65）

2015年（41）

我的朋友

frankzfz

数据包接收系列 — NAPI的原理和实现

标签：

2014-03-27 17:14 5494人阅读评论(5) 举报

		 分类：
	
		Network（7）  Kernel（37）

		版权声明：本文为博主原创文章，转载请注明出处。
	

		目录(?)[+]
	

本文主要内容：简单分析NAPI的原理和实现。

内核版本：2.6.37

Author：zhangskd @ csdn

概述

NAPI是linux新的网卡数据处理API，据说是由于找不到更好的名字，所以就叫NAPI(New API)，在2.5之后引入。

简单来说，NAPI是综合中断方式与轮询方式的技术。

中断的好处是响应及时，如果数据量较小，则不会占用太多的CPU事件；缺点是数据量大时，会产生过多中断，

而每个中断都要消耗不少的CPU时间，从而导致效率反而不如轮询高。轮询方式与中断方式相反，它更适合处理

大量数据，因为每次轮询不需要消耗过多的CPU时间；缺点是即使只接收很少数据或不接收数据时，也要占用CPU

时间。

NAPI是两者的结合，数据量低时采用中断，数据量高时采用轮询。平时是中断方式，当有数据到达时，会触发中断

处理函数执行，中断处理函数关闭中断开始处理。如果此时有数据到达，则没必要再触发中断了，因为中断处理函

数中会轮询处理数据，直到没有新数据时才打开中断。

很明显，数据量很低与很高时，NAPI可以发挥中断与轮询方式的优点，性能较好。如果数据量不稳定，且说高不高

说低不低，则NAPI则会在两种方式切换上消耗不少时间，效率反而较低一些。

实现

来看下NAPI和非NAPI的区别：

(1) 支持NAPI的网卡驱动必须提供轮询方法poll()。

(2) 非NAPI的内核接口为netif_rx()，NAPI的内核接口为napi_schedule()。

(3) 非NAPI使用共享的CPU队列softnet_data->input_pkt_queue，NAPI使用设备内存(或者

设备驱动程序的接收环)。

(1) NAPI设备结构

				[java] view plaincopy
				
				/* Structure for NAPI scheduling similar to tasklet but with weighting */  
			
				struct napi_struct {  
			
				    /* The poll_list must only be managed by the entity which changes the 
			
				     * state of the NAPI_STATE_SCHED bit. This means whoever atomically 
			
				     * sets that bit can add this napi_struct to the per-cpu poll_list, and 
			
				     * whoever clears that bit can remove from the list right before clearing the bit. 
			
				     */  
			
				    struct list_head poll_list; /* 用于加入处于轮询状态的设备队列 */  
			
				    unsigned long state; /* 设备的状态 */  
			
				    int weight; /* 每次处理的最大数量，非NAPI默认为64 */  
			
				    int (*poll) (struct napi_struct *, int); /* 此设备的轮询方法，非NAPI为process_backlog() */  
			
				#ifdef CONFIG_NETPOLL  
			
				    ...  
			
				#endif  
			
				    unsigned int gro_count;  
			
				    struct net_device *dev;  
			
				    struct list_head dev_list;  
			
				    struct sk_buff *gro_list;  
			
				    struct sk_buff *skb;  
			
				};

(2) 初始化

初始napi_struct实例。

				[java] view plaincopy
				
				void netif_napi_add(struct net_device *dev, struct napi_struct *napi,  
			
				        int (*poll) (struct napi_struct *, int), int weight)  
			
				{  
			
				    INIT_LIST_HEAD(&napi->poll_list);  
			
				    napi->gro_count = 0;  
			
				    napi->gro_list = NULL;  
			
				    napi->skb = NULL;  
			
				    napi->poll = poll; /* 设备的poll函数 */  
			
				    napi->weight = weight; /* 设备每次poll能处理的数据包个数上限 */  
			
				    list_add(&napi->dev_list, &dev->napi_list); /* 加入设备的napi_list */  
			
				    napi->dev = dev; /* 所属设备 */  
			
				#ifdef CONFIG_NETPOLL  
			
				    spin_lock_init(&napi->poll_lock);  
			
				    napi->poll_owner = -1;  
			
				#endif  
			
				    set_bit(NAPI_STATE_SCHED, &napi->state); /* 设置NAPI标志位 */  
			
				}

(3) 调度

在网卡驱动的中断处理函数中调用napi_schedule()来使用NAPI。

				[java] view plaincopy
				
				/** 
			
				 * napi_schedule - schedule NAPI poll 
			
				 * @n: napi context 
			
				 * Schedule NAPI poll routine to be called if it is not already running. 
			
				 */  
			
				static inline void napi_schedule(struct napi_struct *n)  
			
				{  
			
				    /* 判断是否可以调度NAPI */  
			
				    if (napi_schedule_prep(n))  
			
				        __napi_schedule(n);  
			
				}

判断NAPI是否可以调度。如果NAPI没有被禁止，且不存在已被调度的NAPI，

则允许调度NAPI，因为同一时刻只允许有一个NAPI poll instance。

				[java] view plaincopy
				
				/** 
			
				 * napi_schedule_prep - check if napi can be scheduled 
			
				 * @n: napi context 
			
				 * Test if NAPI routine is already running, and if not mark it as running. 
			
				 * This is used as a condition variable insure only one NAPI poll instance runs. 
			
				 * We also make sure there is no pending NAPI disable. 
			
				 */  
			
				static inline int napi_schedule_prep(struct napi_struct *n)  
			
				{  
			
				    return !napi_disable_pending(n) && !test_and_set_bit(NAPI_STATE_SCHED, &n->state);  
			
				}  
			
				static inline int napi_disable_pending(struct napi_struct *n)  
			
				{  
			
				    return test_bit(NAPI_STATE_DISABLE, &n->state);  
			
				}   
			
				enum {  
			
				    NAPI_STATE_SCHED, /* Poll is scheduled */  
			
				    NAPI_STATE_DISABLE, /* Disable pending */  
			
				    NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */  
			
				};

NAPI的调度函数。把设备的napi_struct实例添加到当前CPU的softnet_data的poll_list中，

以便于接下来进行轮询。然后设置NET_RX_SOFTIRQ标志位来触发软中断。

				[java] view plaincopy
				
				void __napi_schedule(struct napi_struct *n)  
			
				{  
			
				    unsigned long flags;  
			
				    local_irq_save(flags);  
			
				    ____napi_schedule(&__get_cpu_var(softnet_data), n);  
			
				    local_irq_restore(flags);  
			
				}  
			
				static inline void ____napi_schedule(struct softnet_data *sd, struct napi_struct *napi)  
			
				{  
			
				    /* 把napi_struct添加到softnet_data的poll_list中 */  
			
				    list_add_tail(&napi->poll_list, &sd->poll_list);  
			
				    __raise_softirq_irqoff(NET_RX_SOFTIRQ); /* 设置软中断标志位 */  
			
				}

(4) 轮询方法

NAPI方式中的POLL方法由驱动程序提供，在通过netif_napi_add()加入napi_struct时指定。

在驱动的poll()中，从自身的队列中获取sk_buff后，如果网卡开启了GRO，则会调用

napi_gro_receive()处理skb，否则直接调用netif_receive_skb()。

POLL方法应该和process_backlog()大体一致，多了一些具体设备相关的部分。

(5) 非NAPI和NAPI处理流程对比

以下是非NAPI设备和NAPI设备的数据包接收流程对比图：

NAPI方式在上半部中sk_buff是存储在驱动自身的队列中的，软中断处理过程中驱动POLL方法调用

netif_receive_skb()直接处理skb并提交给上层。

				[java] view plaincopy
				
				/** 
			
				 * netif_receive_skb - process receive buffer from network 
			
				 * @skb: buffer to process 
			
				 * netif_receive_skb() is the main receive data processing function. 
			
				 * It always succeeds. The buffer may be dropped during processing 
			
				 * for congestion control or by the protocol layers. 
			
				 * This function may only be called from softirq context and interrupts 
			
				 * should be enabled. 
			
				 * Return values (usually ignored): 
			
				 * NET_RX_SUCCESS: no congestion 
			
				 * NET_RX_DROP: packet was dropped 
			
				 */  
			
				int netif_receive_skb(struct sk_buff *skb)  
			
				{  
			
				    /* 记录接收时间到skb->tstamp */  
			
				    if (netdev_tstamp_prequeue)  
			
				        net_timestamp_check(skb);  
			
				    if (skb_defer_rx_timestamp(skb))  
			
				        return NET_RX_SUCCESS;  
			
				#ifdef CONFIG_RPS  
			
				    ...  
			
				#else  
			
				    return __netif_receive_skb(skb);  
			
				#endif  
			
				}

__netif_receive_skb()在上篇blog中已分析过了，接下来就是网络层来处理接收到的数据包了。

来自： http://blog.csdn.net/zhangskd/article/details/21627963

阅读(1832) | 评论(0) | 转发(0) |

上一篇：转载 eth_type_trans(skb, bp->dev);

下一篇：ngrok 内网穿透利器

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6