Linux下NAT/NAPT规则源码分析-xiaosuo-ChinaUnix博客

Free Gentuxxiaosuo.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

xiaosuo

博客访问： 2062681
博文数量： 369
博客积分： 10093
博客等级：上将
技术积分： 4271
用户组：普通用户
注册时间： 2005-03-21 00:59

文章分类

全部博文（369）

八卦新闻（8）
源码阅读（8）
系统管理（32）
内核编程（38）
程序设计（95）
Who am I ?（188）
未分配的博文（0）

文章存档

2013年（1）

2011年（2）

2010年（10）

2009年（16）

2008年（33）

2007年（146）

2006年（160）

2005年（1）

我的朋友

相关博文

Linux下NAT/NAPT规则源码分析

分类： LINUX

2006-12-19 22:25:30

前面有一篇文章分析了为什么在PREROUTING做DNAT对本地连接不起作用？本文再紧接着上文，深入分析一下NAT/NAPT的规则。

事情的起因要从上的那篇的文章说起，因为我的本科生毕业设计也是做P2P相关的内容，所以对于此类防火墙穿透的技术还是很关心，也是很感兴趣的，可当我跟随着那篇文章的链接找到《The hole trict》的时候，又不禁觉得有些失望，原来只是UDP打洞技术（UDP Hole Punching）的简要介绍，并没有任何新的技术，就连Skype独特的“超级节点”都没能提及，看来这篇文章也只能算是一篇“科普文”！

UDP打洞技术可谓是历史悠久了，应该也广为内行人所知了吧，在此我也不想多费口舌（写这篇文章也并不需要凑字数），所以，如果看客们感兴趣，不妨google一下“UDP打洞技术”（中文即可），相信那些文章都要比我讲得好。

就像开头那样，我将通过对Linux内核源码的分析，得出其NAT/NAPT规则，并指出他对UDP打洞技术是支持的。废话少说，先上源码:

File: net/ipv4/netfilter/ip_nat_core.c

   294 unsigned int
   295 ip_nat_setup_info(struct ip_conntrack *conntrack,
   296            const struct ip_nat_range *range,
   297            unsigned int hooknum)
   298 {
   299      struct ip_conntrack_tuple curr_tuple, new_tuple;
   300      struct ip_nat_info *info = &conntrack->nat.info;
   301      int have_to_hash = !(conntrack->status & IPS_NAT_DONE_MASK);
   302      enum ip_nat_manip_type maniptype = HOOK2MANIP(hooknum);
   303
   304      IP_NF_ASSERT(hooknum == NF_IP_PRE_ROUTING
   305               || hooknum == NF_IP_POST_ROUTING
   306               || hooknum == NF_IP_LOCAL_IN
   307               || hooknum == NF_IP_LOCAL_OUT);
   308      BUG_ON(ip_nat_initialized(conntrack, maniptype));
   309
   310      /* What we've got will look like inverse of reply. Normally
   311         this is what is in the conntrack, except for prior
   312         manipulations (future optimization: if num_manips == 0,
   313         orig_tp =
   314         conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple) */
   315      invert_tuplepr(&curr_tuple,
   316                 &conntrack->tuplehash[IP_CT_DIR_REPLY].tuple);
   317
   318      get_unique_tuple(&new_tuple, &curr_tuple, range, conntrack, maniptype);
   319

其中参数range是转换后的ip地址和端口范围，hooknum为nat table的hook点，我们已经知道Linux是通过hook点将NAT分成SNAT和DNAT的，所以从这个参数我们就能得到NAT类型mainiptype，将conntrack中的tuple（tuple就是一个包含了源和目的信息的结构，也是ip包寻找到conntrack的入口）反向，我们就得到了“原始方向”tuple，不过应该注意这里所说的原始方向并不一定是真正的原始方向，因为这个conntrack很有可能是经过了位于PREROUTING的DNAT之后的，也就是说这个“原始”仅仅代表没有经过此类NAT。接下来我们将进入函数get_unique_tuple得到NAT后的tuple，

File: net/ipv4/netfilter/ip_nat_core.c

   246 static void
   247 get_unique_tuple(struct ip_conntrack_tuple *tuple,
   248           const struct ip_conntrack_tuple *orig_tuple,
   249           const struct ip_nat_range *range,
   250           struct ip_conntrack *conntrack,
   251           enum ip_nat_manip_type maniptype)
   252 {
   253      struct ip_nat_protocol *proto;
   254
   255      /* 1) If this srcip/proto/src-proto-part is currently mapped,
   256         and that same mapping gives a unique tuple within the given
   257         range, use that.
   258
   259         This is only required for source (ie. NAT/masq) mappings.
   260         So far, we don't do local source mappings, so multiple
   261         manips not an issue. */
   262      if (maniptype == IP_NAT_MANIP_SRC) {
   263          if (find_appropriate_src(orig_tuple, tuple, range)) {
   264              DEBUGP("get_unique_tuple: Found current src map\n");
   265              if (!ip_nat_used_tuple(tuple, conntrack))
   266                  return;
   267          }
   268      }

注释写的应该足够明白了，就是说如果是做SNAT，并且此源地址（包括ip地址和端口等信息）已经做过转换，要是这样产生的tuple仍然是唯一的话，那么转换成功结束。否则，接着做:

File: net/ipv4/netfilter/ip_nat_core.c

   269
   270      /* 2) Select the least-used IP/proto combination in the given
   271         range. */
   272      *tuple = *orig_tuple;
   273      find_best_ips_proto(tuple, range, conntrack, maniptype);

从注释来看，似乎应该是找一个最少利用的IP，可函数find_best_ips_proto根本就没有这么做，看来注释有的时候也是不可靠的。
File: net/ipv4/netfilter/ip_nat_core.c

   228      /* Hashing source and destination IPs gives a fairly even
   229       * spread in practice (if there are a small number of IPs
   230       * involved, there usually aren't that many connections
   231       * anyway). The consistency means that servers see the same
   232       * client coming from the same IP (some Internet Banking sites
   233       * like this), even across reboots. */
   234      minip = ntohl(range->min_ip);
   235      maxip = ntohl(range->max_ip);
   236      j = jhash_2words(tuple->src.ip, tuple->dst.ip, 0);
   237      *var_ipp = htonl(minip + j % (maxip - minip + 1));

通过阅读上面的代码我们可以发现，转换后的ip地址是和原始的源和目的ip地址相关的，并且hash函数jhash_2words力争做到在多个ip之间平均分配，在这个意义上来说，上面注释的“选择最少利用的ip地址”似乎也是有些根据的。下面就是和协议相关的端口转换：

   274
   275      /* 3) The per-protocol part of the manip is made to map into
   276         the range to make a unique tuple. */
   277
   278      proto = ip_nat_proto_find_get(orig_tuple->dst.protonum);
   279
   280      /* Only bother mapping if it's not already in range and unique */
   281      if ((!(range->flags & IP_NAT_RANGE_PROTO_SPECIFIED)
   282           || proto->in_range(tuple, maniptype, &range->min, &range->max))
   283          && !ip_nat_used_tuple(tuple, conntrack)) {
   284          ip_nat_proto_put(proto);
   285          return;
   286      }
   287
   288      /* Last change: get protocol to try to obtain unique tuple. */
   289      proto->unique_tuple(tuple, range, maniptype, conntrack);
   290
   291      ip_nat_proto_put(proto);
   292 }

如果端口不限或在指定的端口范围内，并且此tuple唯一，那么转换成功。否则，做端口转换(上面的289行）。仅选择UDP协议分析:

File: net/ipv4/netfilter/ip_nat_proto_udp.c

    38 static int
    39 udp_unique_tuple(struct ip_conntrack_tuple *tuple,
    40           const struct ip_nat_range *range,
    41           enum ip_nat_manip_type maniptype,
    42           const struct ip_conntrack *conntrack)
    43 {
    44      static u_int16_t port;
    45      u_int16_t *portptr;
    46      unsigned int range_size, min, i;
    47
    48      if (maniptype == IP_NAT_MANIP_SRC)
    49          portptr = &tuple->src.u.udp.port;
    50      else
    51          portptr = &tuple->dst.u.udp.port;
    52
    53      /* If no range specified... */
    54      if (!(range->flags & IP_NAT_RANGE_PROTO_SPECIFIED)) {
    55          /* If it's dst rewrite, can't change port */
    56          if (maniptype == IP_NAT_MANIP_DST)
    57              return 0;
    58
    59          if (ntohs(*portptr) < 1024) {
    60              /* Loose convention: >> 512 is credential passing */
    61              if (ntohs(*portptr)<512) {
    62                  min = 1;
    63                  range_size = 511 - min + 1;
    64              } else {
    65                  min = 600;
    66                  range_size = 1023 - min + 1;
    67              }
    68          } else {
    69              min = 1024;
    70              range_size = 65535 - 1024 + 1;
    71          }
    72      } else {
    73          min = ntohs(range->min.udp.port);
    74          range_size = ntohs(range->max.udp.port) - min + 1;
    75      }
    76
    77      for (i = 0; i < range_size; i++, port++) {
    78          *portptr = htons(min + port % range_size);
    79          if (!ip_nat_used_tuple(tuple, conntrack))
    80              return 1;
    81      }
    82      return 0;
    83 }

对于没有指定端口范围的情况，按照原始端口将端口分为如下三段：1-511(origPort < 512)，600-1023(512 <= origPort < 1024)，1024-65535(origPort > 1024)，然后在段中顺序搜索直到找到一个可用的端口为止。如果你够细心，你会发现port实际上是一个静态变量，这也是在两次查找之间保存上下文的一种方式，在某种程度上也许会减小搜索代价。你也可能为1024这个魔术感到迷惑，那么请想想TCP/UDP端口的分配吧！

TCP的相关代码与UDP很是类似，请自行分析。

分析到这里，基本上也就完成了。简单地总结其步骤如下:

采用已有的SNAT转换表
做NAT
做NPT

如果以上步骤都未能满足转换后唯一tuple的条件，那么此包和对应conntrack将在进行confrim的时候由于和其他conntrack的tuple重复而被丢掉，详见函数__ip_conntrack_confirm。

通过以上的分析，我们不难发现Linux的NAT类型属于“锥形NAT”，当然对“UDP打洞技术”友好了。

阅读(7410) | 评论(2) | 转发(0) |

上一篇：生日

下一篇：圣诞节前一天

给主人留下些什么吧！~~

srs04202011-11-22 10:09:19

你好，请问你能把nat的这个整体的源码传给我下吗？或者上传到网上分享吗？感谢你的无私奉献

回复 | 举报

xiaosuo2010-08-02 22:31:38

最近又详细分析了这部分代码，发现Linux的SNAT端口还是较好预测的，因为Linux总是：首先采用已有的源IP和源Port映射；其次只做IP地址转换；最后才结合端口地址转换找到唯一的tuple。就连最后唯一端口地址的查找也是优先考虑上次此类查找到的那个端口。

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6