Linux内核分析 - 网络[四]：路由表--ChinaUnix博客

xlzxlz2的ChinaUnix博客liuzhong.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

xlzxlz2

博客访问： 836915
博文数量： 264
博客积分： 592
博客等级：中士
技术积分： 1574
用户组：普通用户
注册时间： 2011-10-24 22:02

文章分类

全部博文（264）

Toolchain（2）
icmp（0）
3G（0）
shell（1）
OPWRT（4）

LUCI（1）
Linux应用程序开（10）
Wireless 80（0）
Linux平台开发（15）
工具软件（5）
网络芯片（3）
MIPs架构（1）
PowerPC架构（1）
编程技巧（1）
网络协议（15）

dhcp（3）
设备驱动（25）
uboot（10）
硬件接口（11）
linux内核（145）

led子系统（2）

邻居子系统（1）

中断子系统（2）

IP协议栈分析（18）

SKB管理（1）

Linux 路由（15）

Linux 网桥（16）

Linux PPP拨（4）

Linux QOS（28）

Linux netfi（39）
未分配的博文（15）

文章存档

2019年（2）

2018年（1）

2017年（1）

2016年（4）

2015年（14）

2014年（57）

2013年（88）

2012年（97）

我的朋友

相关博文

Linux内核分析 - 网络[四]：路由表

分类： LINUX

2012-11-10 17:14:12

轉：

路由表

在内核中存在路由表fib_table_hash和路由缓存表rt_hash_table。路由缓存表主要是为了加速路由的查找，每次路由查询都会先查找路由缓存，再查找路由表。这和cache是一个道理，缓存存储最近使用过的路由项，容量小，查找快速；路由表存储所有路由项，容量大，查找慢。

首先，应该先了解路由表的意义，下面是route命令查看到的路由表：

Destination	Netmask	Gateway	Flags	Interface	Metric
169.254.0.0	255.255.0.0	*	U	eth0	1
192.168.123.0	255.255.255.0	*	U	eth0	1
default	0.0.0.0	192.168.123.254	UG	eth0	1

一条路由其实就是告知主机要到达一个目的地址，下一跳应该走哪里。比如发往192.168.22.3报文通过查路由表，会得到下一跳为192.168.123.254，再将其发送出去。在路由表项中，还有一个很重要的属性-scope，它代表了到目的网络的距离。

路由scope可取值：RT_SCOPE_UNIVERSE, RT_SCOPE_LINK, RT_SCOPE_HOST

在报文的转发过程中，显然是每次转发都要使到达目的网络的距离要越来越小或不变，否则根本到达不了目的网络。上面提到的scope很好的实现这个功能，在查找路由表中，表项的scope一定是更小或相等的scope(比如RT_SCOPE_LINK，则表项scope只能为RT_SCOPE_LINK或RT_SCOPE_HOST)。

路由缓存

路由缓存用于加速路由的查找，当收到报文或发送报文时，首先会查询路由缓存，在内核中被组织成hash表，就是rt_hash_table。

static struct rt_hash_bucket *rt_hash_table __read_mostly; [net\ipv4\route.c]

通过ip_route_input()进行查询，首先是缓存操作时，通过[src_ip, dst_ip, iif,rt_genid]计算出hash值

hash = rt_hash(daddr, saddr, iif, rt_genid(net));

此时rt_hash_table[hash].chain就是要操作的缓存表项的链表，比如遍历该链表

for (rth = rt_hash_table[hash].chain; rth; rth = rth->u.dst.rt_next)

因此，在缓存中查找一个表项，首先计算出hash值，取出这组表项，然后遍历链表，找出指定的表项，这里需要完全匹配[src_ip, dst_ip, iif, tos, mark, net]，实际上struct rtable中有专门的属性用于缓存的查找键值 – struct flowi。

/* Cache lookup keys */

struct flowi fl;

当找到表项后会更新表项的最后访问时间，并取出dst

dst_use(&rth->u.dst, jiffies);

skb_dst_set(skb, &rth->u.dst);

路由缓存的创建

inet_init() -> ip_init() -> ip_rt_init()

rt_hash_table = (struct rt_hash_bucket *)

alloc_large_system_hash("IP route cache",

sizeof(struct rt_hash_bucket),

rhash_entries,

(totalram_pages >= 128 * 1024) ?

15 : 17,

&rt_hash_log,

&rt_hash_mask,

rhash_entries ? 0 : 512 * 1024);

其中rt_hash_mask表示表的大小，rt_hash_log = log(rt_hash_mask)，创建后的结构如图所示：

路由缓存插入条目

函数rt_intern_hash()

要插入的条目是rt，相应散列值是hash，首先通过hash值找到对应的bucket

rthp = &rt_hash_table[hash].chain;

然后对bucket进行一遍查询，这次查询的目的有两个：如果是超时的条目，则直接删除；如果是与rt相同键值的条目，则删除并将rt插入头部返回。

while ((rth = *rthp) != NULL) {

if (rt_is_expired(rth)) { // 超时的条目

*rthp = rth->u.dst.rt_next;

rt_free(rth);

continue;

}

if (compare_keys(&rth->fl, &rt->fl) && compare_netns(rth, rt)) { //重复的条目

*rthp = rth->u.dst.rt_next;

rcu_assign_pointer(rth->u.dst.rt_next, rt_hash_table[hash].chain);

rcu_assign_pointer(rt_hash_table[hash].chain, rth);

……

}

……

rthp = &rth->u.dst.rt_next;

}

在扫描一遍后，如rt还未存在，则将其插入头部

rt->u.dst.rt_next = rt_hash_table[hash].chain;

rcu_assign_pointer(rt_hash_table[hash].chain, rt);

如果新插入rt满足一定条件，还要与ARP邻居表进行绑定

Hint：缓存的每个bucket是没有头结点的，单向链表，它所使用的插入和删除操作是值得学习的，简单实用。

路由缓存删除条目

rt_del()

要删除的条目是rt，相应散列值是hash，首先通过hash值找到对应的bucket，然后遍历，如果条目超时，或找到rt，则删除它。

rthp = &rt_hash_table[hash].chain;

spin_lock_bh(rt_hash_lock_addr(hash));

ip_rt_put(rt);

while ((aux = *rthp) != NULL) {

if (aux == rt || rt_is_expired(aux)) {

*rthp = aux->u.dst.rt_next;

rt_free(aux);

continue;

}

rthp = &aux->u.dst.rt_next;

}

spin_unlock_bh(rt_hash_lock_addr(hash));

阅读(952) | 评论(0) | 转发(0) |

上一篇：Linux内核分析 - 网络[三]：从netif_receive_skb()说起--arp

下一篇：Linux内核分析 - 网络[一]：收发数据包的调用

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6