Chinaunix首页 | 论坛 | 博客
  • 博客访问: 14001
  • 博文数量: 6
  • 博客积分: 110
  • 博客等级: 入伍新兵
  • 技术积分: 75
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-20 12:37
文章分类

全部博文(6)

文章存档

2014年(1)

2011年(5)

我的朋友

分类: LINUX

2011-03-21 20:51:39

19.1. Main IPv4 Data Structures

主要的ipv4的数据结构

This section introduces the major data structures used by the IPv4 protocol. You can refer to  for a detailed description of their fields.

这一章介绍IPv4协议使用的主要数据结构。可参见23章看详细的描述。

I have not included a picture to show the relationships among the data structures because most of them are independent and do not keep cross-references.

没有使用图来展示数据结构之间的关系,因为大多是独立和没有交叉引用的。

iphdr structure

IP header. The meaning of its fields has already been covered in the section "" in .

iphdr,在18章中有详细描述。

ip_options structure

This structure, defined in include/linux/ip.h, represents the options for a packet that needs to be transmitted or forwarded. The options are stored in this structure because it is easier to read than the corresponding portion of the IP header itself.

ip_options,在include/linux/ip.h声明。代表需要发送或转发的数据包的选项。
选项保存在该结构中比在iphdr中更易读。

ipcm_cookie structure

This structure combines various pieces of information needed to transmit a packet.

ipcm cookie,组合了发送数据包需要的各种片段信息。

ipq structure

Collection of fragments of an IP packet. See the section "" in .

ipq,收集IP包的分片。参见22章组织IP分配Hash表。

inet_peer structure

The kernel keeps an instance of this structure for each remote host it has been talking to in the recent past. In the section "" in  you will see how it is used. All instances of inet_peer structures are kept in an AVL tree, a structure optimized for frequent lookups.

inet_peer,内核为每个最近和他通信的远程主机保存一个该结构的实例。23章的“长期IP Peer信息”描述了如何使用它。所有的inet_peer实例保存在AVL树中,一个为频繁查找进行优化的结构。

ipstats_mib structure

The Simple Network Management Protocol (SNMP) employs a type of object called a Management Information Base (MIB) to collect statistics about systems. A data structure calledipstats_mib keeps statistics about the IP layer . The section "" in  covers this structure in more detail.

ipstats_mib,SNMP使用了一个MIB类型的对象来手机系统的统计信息。IP层使用的称之为ipstats_mib。
详见23章的“IP统计”。

in_device structure

The in_device structure stores all the IPv4-related configuration for a network device, such as changes made by a user with the ifconfig or ip command. This structure is linked to thenet_device structure via net_device->ip_ptr and can be retrieved with in_dev_get and _ _in_dev_get. The difference between those two functions is that the first one takes care of all the necessary locking, and the second one assumes the caller has taken care of it already.

Since in_dev_get internally increases a reference count on the in_dev structure when it succeeds (i.e., when a device is configured to support IPv4), its caller is supposed to decrement the reference count with in_dev_put when it is done with the structure.

The structure is allocated and linked to the device with inetdev_init, which is called when the first IPv4 address is configured on the device.

in_device结构体,为网卡保存所有IPv4相关的配置信息,例如用户通过ifconfig或ip命令做出的改变。
通过net_device->ip_ptr指针链接到net_device,可以通过in_dev_get和__in_dev_get获取(前者已进行必要的加锁,后者需要调用者处理)

in_dev_get在成功后会增加in_dev中的引用计数(例如一个设备被配置为支持IPv4)。在_in_dev_put完成时减少引用计数。

in_device使用inetdev_init分配内存和初始化,当给设备配置第一个IP地址时被调用。

in_ifaddr structure

When configuring an IPv4 address on an interface, the kernel creates an in_ifaddr structure that includes the 4-byte address along with several other fields.

in_ifaddr,当在一个接口上配置ipv4地址时,内核创建了一个in_ifaddr结构(包括4字节地址和其它信息)

ipv4_devconf structure

The ipv4_devconf data structure, whose fields are exported via /proc in /proc/sys/net/ipv4/conf/, is used to tune the behavior of a network device. There is an instance for each device, plus one that stores the default values (ipv4_devconf_dflt). The meanings of its fields are covered in  and .

ipv4_devconf,他的域通过/proc或/proc/sys/net/ipv4/conf/导出,用于调节网卡的行为。
每一个网卡一个实例,同时还有一个ipv4_devconf_dflt保存默认值。
28章和36章都描述了这个结构体的域。

ipv4_config structure

While ipv4_devconf structures are used to store per-device configuration, ipv4_config stores configuration that applies to the host.

ipv4_config,ipv4_devconf保存每个网卡的配置,ipv4_config保存主机的配置。

cork

The cork structure is used to handle the socket CORK option . We will see in  how its fields are used to maintain some context information across consecutive invocations ofip_append_data and ip_append_page to handle data fragmentation.

cork,保存socket的CORK选项。详见21章,看如何使用它的各个域来在连续地ofip_append_data和ip_append_page保存上下文信息,用于处理数据分片。

19.1.1. Checksum-Related Fields from sk_buff and net_device Structures

sk_buff和net_device中校验和相关的域

We saw the routines used to compute the IP and L4 checksums in the section "" in . In this section, we will see what fields of the sk_buff buffer structure are used to store information about checksums, how devices tell the kernel about their hardware checksumming capabilities, and how the L4 protocols use such information to decide whether to compute the checksum for ingress and egress packets or to let the network interface cards (NICs) do it.

在18章的校验和一节我们看到了IP和L4的校验和的计算。

这一节,我们将看到sk_buff的哪些域用于保存校验和相关的信息,内核是如何知道硬件计算校验和的能力,

以及L4协议如何使用这些信息来决定是否对收发的数据包计算校验和或是让网卡来完成校验和。

Because the IP checksum is always computed and verified in software by the kernel, the next subsections concentrate on L4 checksum handling and issues.

因为在内核中IP校验和总是被计算和验证,下面的节将集中讨论L4的校验和声明和计算。

19.1.1.1. net_device structure

net_device结构

The net_device->features field specifies the capabilities of the device. Among the various flags that can be set, a few are used to define the device's hardware checksumming capabilities. The list of possible features is in include/linux/netdevice.h inside the definition of net_device itself. Here are the flags used to control checksumming:

net_device->features域指明了设备计算校验和的能力。
硬件校验和有关的标志有如下一些。
在include/linux/netdevice.h的net_device结构体的定义中被定义。

NETIF_F_NO_CSUM

The device is so reliable that there is no need to use any L4 checksum. This feature is enabled, for instance, on the loopback device.

NETIF_F_NO_CSUM,设备是如此的可靠以至于不需要L4的校验和。在环回设备被使能。

NETIF_F_IP_CSUM

The device can compute the L4 checksum in hardware, but only for TCP and UDP over IPv4.

NETIF_F_IP_CSUM,设备可以计算L4的校验和,但仅针对tcp和udp(ipv4)

NETIF_F_HW_CSUM

The device can compute the L4 checksum in hardware for any protocol. This feature is less common than NETIF_F_IP_CSUM.

NETIF_F_HW_CSUM,设备可以计算任何协议的校验和。

19.1.1.2. sk_buff structure

sk_buff结构

The two fields skb->csum and skb->ip_summed have different meanings depending on whether skb points to a received packet or to a packet to be transmitted out.

skb->csum和skb->summed在接收和发送数据包时有不同的含义。

When a packet is received, skb->csum may hold its L4 checksum. The oddly named skb->ip_summed field keeps track of the status of the L4 checksum. The status is indicated by the following values, defined in include/linux/skbuff.h. The following definitions represent what the device driver tells the L4 layer. Once the L4 receive routine receives the buffers, it may change the initialization of skb->ip_summed.

当数据包接收时,skb->csum保存L4的校验和。skb->ip_summed跟踪校验和的状态。
表示状态的红在include/linux/skbuff.h中定义。
这些定义表示设备要告诉L4的信息。
一旦L4接收到数据,他可以修改skb->ip_summed。

CHECKSUM_NONE

The checksum in csum is not valid. This can be due to various reasons:

CHECKSUM_NONE,csum保存的校验和非法。可能由于以下原因

  • The device does not provide hardware checksumming.

    设备不支持硬件校验和计算验证。

  • The device computed the hardware checksums and found the frame to be corrupted. At this point, the device driver could discard the frame directly. But some device drivers prefer to set ip_summed to CHECKSUM_NONE and let the software compute and verify the checksum again. This is unfortunate, because after all of the overhead of receiving the packet, all that the kernel does is recheck the checksum and discard the packet (see e1000_rx_checksum in drivers/net/e1000/e1000_main.c). Note that if the input frame is to be forwarded, the router should not discard it due to a wrong L4 checksum (a router is not supposed to look at the L4 checksum). It will be up to the destination host to do it. This is another reason why device drivers do not discard frames that fail the L4 checksum, but let the L4 receive routine verify them.

    设备支持校验和计算验证,发现错误。

    本可以直接扔掉,但有时也需要上传处理。在转发时不应该丢弃,应该有目的主机丢弃。

  • The checksum needs to be recomputed and reverified. See the section "" in  for the most common reasons.

    校验和需要重新计算和重新验证。创建18章“L4校验和的变更”


CHECKSUM_HW

The NIC has computed the checksum on the L4 header and payload and has copied it into the skb->csum field. The software (i.e., the L4 receive routine) needs only to add the checksum on the pseudoheader to skb->csum and to verify the resulting checksum. This flag can be considered a special case of the following flag.

CHECKSUM_HW,网卡计算L4的校验和并拷贝到skb->csum域。

软件仅需要添加伪头部的校验和到skb->csum中并验证校验和。

这个可以看作CHECKSUM_UNNECESSARY的特例。

CHECKSUM_UNNECESSARY

The NIC has computed and verified the checksum on the L4 header and checksum, as well as on the pseudoheader (the checksum on the pseudoheader may optionally be computed by the device driver in software), so the software is relieved from having to do any L4 checksum verification.

NIC计算和验证L4头部的校验和,包括伪头部,因此软件不需要处理L4的校验和和验证。

CHECKSUM_UNNECESSARY can also be set, for example, when the probability of an error is very low and it would be a waste of time and CPU power to compute and verify the L4 checksum. One example is the loopback device: since the packets sent through this virtual device never leave the local host, the only possible errors would be due to faulty RAM or bugs in the operating system. This option can therefore be used with such special devices, but the standard behavior is to compute the checksum of each received packet and discard corrupted packets at the receiving end.

CHECKSUM_UNNECESSARY可以在风险很小的情况下被设置,例如环回设备,错误仅可能在OS错误和RAM错误时发生。

When a packet is transmitted, csum represents a pointer (or more accurately, an offset) to the place inside the buffer where the hardware card has to put the checksum it will compute, not the checksum itself. This field is therefore used during packet transmission only if the checksum is calculated in hardware. This interaction between L4 and L2, bypassing L3, introduces a couple of additional problems to deal with. For example, a feature such as Network Address Translation (NAT) that manipulates the fields of the IP header used by the L4 layer to compute the so-called checksum on the pseudoheader would invalidate that data structure (see the section "" in ).

当数据包被发送,csum代表了一个指向缓冲区中用于硬件必须计算校验和并存放的位置(准确地说是偏移)。

这个标志仅在发送由硬件计算校验和数据时被设置。

L4和L2通过L3的交互,介绍了一系列需要处理的问题。例如,NAT要操作IP header的域(用于L4在伪头部计算校验和的)会废除就的IP header,详见18章“IPv4校验和的变更”

As in the case of reception, ip_summed represents the status of the L4 checksum. The field is used by the L4 protocols to tell the device whether it needs to take care of checksumming. In particular, this is the meaning of ip_summed during transmissions:

对接收来说,ip_summed表示L4校验和的状态,这个表示被L4协议用来确认是否需要关注校验和。
特别地,这也是发送时ip_summed的含义。

CHECKSUM_NONE

The protocol has already taken care of the checksum; the device does not need to do anything. When you forward an ingress frame, the L4 checksum is already ready because it has been computed by the sender host; therefore, there is no need to compute it. See ip_forward in . When ip_summed is set to CHECKSUM_NONE, csum is meaningless.

CHECKSUM_NONE,协议已经关注校验和了。设备不需要做任何事。当你转发一个接收的数据包,发送者已将L4校验和已准备好。因此,网卡不需要做什么了。详见20章“转发”。当CHECKSUM_NONE被设置,csum没有意义。

CHECKSUM_HW

The protocol has stored into its header the checksum on the pseudoheader only; the device is supposed to complete it by adding the checksum on the L4 header and payload.

协议栈仅保存伪头部的校验和到它的头部,硬件要加上L4头部和负载的校验和。

ip_summed does not use the CHECKSUM_UNNECESSARY value when transmitting packets (it would be equivalent to CHECKSUM_NONE).

ip_summed在发送数据包时不使用CHECKSUM_UNNECESSARY(和CHECKSUM_NONE含义一样)。

While the feature flags NETIF_F_XXX_CSUM are initialized by the device driver when the NIC is enabled, the CHECKSUM_XXX flags have to be set for every sk_buff buffer that is received or transmitted. At reception time, it is the device driver that initializes ip_summed correctly based on the NETIF_F_XXX_CSUM device capabilities.

当NIC使能时,设备驱动会设置net_device的features的NETIF_F_XXX_CSUM标志。

在每个skb发送和接收时CHECKSUM_XXX标志必须被设置。

接收:设备驱动工具NETIF_F_XXX_CSUM正确初始化ip_summed。

At transmission time, the L3 transmission APIs initialize ip_summed based on the checksumming capabilities of the egress device, which can be derived from the routing table: the routing table cache entry that matches the destination includes information about the egress device, and therefore its checksumming capabilities (see ip_append_data for an example).

发送:L3发送AIPs工具发送设备的校验和计算特征初始化ip_summed-可以通过路由表转发。

路由表缓存项匹配了包含发送设备信息以及发送设备校验和计算能力的目地信息。

Given the meaning of the skb->csum and skb->ip_summed fields and the CHECKSUM_HW flag previously described, you can study, for example, how TCPv4 takes care of the checksum on ingress segments in tcp_v4_checksum_init, and the checksum of egress segments in tcp_v4_send_check.

通过上面描述的skb->csum和skb->ip_summed域以及之前描述符的CHECKCSUM_HW标志,你可以研究下,例如,TCPv4在tcp_v4_checksum_init中是如何处理接收分片的校验和,以及tcp_v4_send_check后出包校验和是如何处理的。

阅读(769) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~