分类: 系统运维
2012-05-24 18:23:29
<- IP datagram -> | ||
<- UDP datagram -> | ||
IP header | UDP header | UDP data |
20 bytes | 8 bytes |
UDP provides no reliability: it sends the datagrams that the application writes to the IP layer, but there is no guarantee that they ever reach their destination. Given this lack of reliability, we are tempted to think we should avoid UDP and always use a reliable protocol such as TCP.
The application needs to worry about the size of the resulting IP datagram. If it exceeds the network's MTU (Section 2.8), the IP datagram is fragmented. This applies to each network that the datagram traverses from the source to the destination, not just the first network connected to the sending host.
UDP Header
00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16-bit Source Port number | 16-bit Destination Port number | ||||||||||||||||||||||||||||||
16-bit UDP Length | 16-bit UDP Checksum | ||||||||||||||||||||||||||||||
UDP Data (if any) |
The
port numbers identify the sending process and the receiving process. In
Figure 1.8 we showed that TCP and UDP use the destination port number
to demultiplex incoming data from IP. Since IP has already demultiplexed
the incoming IP datagram to either TCP or UDP (based on the protocol
value in the IP
header), this means the TCP port numbers are looked
at by TCP, and the UDP port numbers by UDP. The TCP port numbers are
independent of the UDP port numbers.
Despite
this independence, if a well-known service is provided by both TCP and
UDP, the port number is normally chosen to be the same for both
transport layers. This is purely for convenience and is not required by
the protocols.
The
UDP length field is the length of the UDP header and the UDP data in
bytes. The minimum value for this field is 8 bytes. (Sending a UDP
datagram with 0 bytes of data is OK.) This UDP length is redundant. The
IP datagram contains its total length in bytes (Figure 3.1), so the
length of the UDP datagram is this total length minus the length of the
IP header.
UDP Checksum
The UDP checksum covers the UDP header and the UDP data. Recall that the checksum in the IP header only covers the IP header-it does not cover any data in the IP datagram. Both UDP and TCP have checksums in their headers to cover their header and their data. With UDP the checksum is optional, while with TCP it is mandatory.
Although
the basics for calculating the UDP checksum are similar to the IP
header checksum (the ones complement sum of 16-bit words), there are
differences.
First, the length of the UDP datagram can be an odd number of bytes, while the checksum algorithm adds 16-bit words. The solution is to append a pad byte of 0 to the end, if necessary, just for the checksum computation. (That is, this possible pad byte is not transmitted.)
Next, both UDP and TCP include a 12-byte pseudo-header with the UDP datagram (or TCP segment) just for the checksum computation. This pseudo-header includes certain fields from the IP header. The purpose is to let UDP double-check that the data has arrived at the correct destination (i.e., that IP has not accepted a datagram that is not addressed to this host, and that IP has not given UDP a datagram that is for another upper layer).
0 | 15 | 16 | 31 | ||
32-bit source IP address | ^ | UDP pseudo header | v | ||||
32-bit destination IP address | |||||
Zero | 8-bit protocol(17) | 16-bit UDP length | |||
16-bit source port number | 16-bit source port number | ^ | UDP header | v | |||
16-bit UDP length | 16-bit UDP checksum | ||||
data | |||||
padbyte(0) |
In
this figure we explicitly show a datagram with an odd length, requiring
a pad byte for the checksum computation. Notice that the length of the
UDP datagram appears twice in the checksum computation.
If the calculated checksum is 0, it is stored as all one bits (65535), which is equivalent in ones-complementarithmetic. If the transmitted checksum is 0, it indicates that the sender did not compute the checksum.
If the sender did compute a checksum and the receiver detects a checksum error, the UDP datagram is silently discarded. No error message is generated. (This is what happens if an IP header checksum error is detected by IP.)
Despite UDP checksums being optional, they should always be enabled. During the 1980s some computer vendors turned off UDP checksums by default, to speed up their implementation of Sun's Network File System (NFS), which uses UDP. While this might be acceptable on a single LAN, where the cyclic redundancy check on the data-link frame (e.g., Ethernet or token ring frame) can detect most corruption of the frame, when the datagrams pass through routers, all bets are off. Believe it or not, there have been routers with software and hardware bugs that have modified bits in the datagrams being routed. These errors are undetectable in a UDP datagram if the end-to-end UDP checksum is disabled. Also realize that some data-link protocols (e.g., SLIP) don't have any form of data-link checksum.
the
physical network layer normally imposes an upper limit on the size of
the frame that can be transmitted. Whenever the IP layer receives an IP
datagram to send, it determines which local interface the datagram is
being sent on (routing), and queries that interface to obtain its MTU.
IP compares the MTU with the datagram size and performs fragmentation, if necessary. Fragmentation can take place either at the original sending host or at an intermediate router.
When an
IP datagram is fragmented, it is not reassembled until it reaches its
final destination. (This handling of reassembly differs from some other
networking protocols that require reassembly to take place at the next
hop, not at the final destination.) The IP layer at the destination
performs the reassembly. The goal is to make fragmentation and
reassembly transparent to the transport layer (TCP and UDP), which it
is, except for possible performance degradation. It is also possible for
the fragment of a datagram to again be fragmented (possibly more than
once). The information maintained in the IP header for fragmentation and
reassembly provides enough information to do this.
Recalling the IP header, the following fields are used in fragmentation. The identification
field contains a unique value for each IP datagram that the sender
transmits. This number is copied into each fragment of a particular
datagram. (We now see the use for this field.) The flags field uses one
bit as the "more fragments" bit. This bit is turned on for each fragment
comprising a datagram except the final fragment. The fragment offset
field contains the offset of this fragment from the beginning of the
original datagram. Also, when a datagram is fragmented the total length
field of each fragment is changed to be the size of that fragment.
Finally,
one of the bits in the flags field is called the "don't fragment" bit.
If this is turned on, IP will not fragment the datagram. Instead the
datagram is thrown away and an ICMP error ("fragmentation needed but
don't fragment bit set) is sent to the originator.
When an IP
datagram is fragmented, each fragment becomes its own packet, with its
own IP header, and is routed independently of any other packets. This
makes it possible for the fragments of a datagram to arrive at the final
destination out of order, but there is enough information in the IP
header to allow the receiver to reassemble the fragments correctly.
Although
IP fragmentation looks transparent, there is one feature that makes it
less than desirable: if one fragment is lost the entire datagram must be
retransmitted. To understand why this happens, realize that IP itself
has no timeout and retransmission-that is the responsibility of the
higher layers. (TCP performs timeout and retransmission, UDP doesn't.
Some UDP applications perform timeout and retransmission themselves.)
When a fragment is lost that came from a TCP segment, TCP will time out
and retransmit the entire TCP segment, which corresponds to an IP
datagram. There is no way to resend only one fragment of a datagram.
Indeed, if the fragmentation was done by an intermediate router, and not
the originating system, there is no way for the originating system to
know how the datagram was fragmented, For this reason alone,
fragmentation is often avoided. [Kent and Mogul 1987] provide arguments
for avoiding fragmentation.
Using UDP it
is easy to generate IP fragmentation. (We'll see later that TCP tries
to avoid fragmentation and that it is nearly impossible for an
application to force TCP to send segments large enough to require
fragmentation.)
Also note the terminology: an IP datagram is the
unit of end-to-end transmission at the IP layer (before fragmentation
and after reassembly), and a packet is the unit of data passed between
the IP layer and the link layer. A packet can be a complete IP datagram
or a fragment of an IP datagram.
ICMP Unreachable Error (Fragmentation Required)
Another
variation of the ICMP unreachable error occurs when a router receives a
datagram that requires fragmentation, but the don't fragment (DF) flag
is turned on in the IP header. This error can be used by a program that
needs to determine the smallest MTU in the path to a destination-called
the path MTU discovery mechanism
Determining the Path MTU Using Traceroute
Although
most systems don't support the path MTU discovery feature, we can
easily modify a version of traceroute (Chapter 8) to let us determine
the path MTU. What we'll do is send packets with the "don't fragment"
bit set. The size of the first packet we send will equal the MTU of the
outgoing interface, and whenever we receive an ICMP "can't fragment"
error (which we described in the previous section) we'll reduce the size
of the packet. If the router sending the ICMP error sends the newer
version that includes the MTU of the outgoing interface, we'll use that
value; otherwise we'll try the next smallest MTU. As RFC 1191 [Mogul and
Deering 1990] states, there are a limited number of MTUs, so our
program has a table of the likely values and moves to the next smallest
value.
Maximum UDP Datagram Size
Theoretically,
the maximum size of an IP datagram is 65535 bytes, imposed by the
16-bit total length field in the IP header. With an IP header of 20
bytes and a UDP header of 8 bytes, this leaves a maximum of 65507 bytes
of user data in a UDP datagram. Most implementations, however, provide
less than this maximum.
There are
two limits we can encounter. First the application program may be
limited by its programming interface. The sockets API (Section 1.15)
provides a function that the application can call to set the size of the
receive buffer and the send buffer. For a UDP socket, this size is
directly related to the maximum size
UDP datagram the application can
read or write. Most systems today provide a default of just over 8192
bytes for the maximum size of a UDP datagram that can be read or
written. (This default is because 8192 is the amount of user data that
NFS reads and writes by default.)
The next limitation comes from the kernel's implementation of TCP/IP. There may be implementation
features (or bugs) that limit the size of an IP datagram to less than 65535 bytes.
Datagram Truncation
Just
because IP is capable of sending and receiving a datagram of a given
size doesn't mean the receiving application is prepared to read that
size. UDP programming interfaces allow the application to specify the
maximum number of bytes to return each time. What happens if the
received datagram exceeds the size the application is prepared to deal
with?
Unfortunately the answer depends on the programming interface and the implementation. The traditional Berkeley version of the sockets API truncates the datagram, discarding any excess data. Whether the application is notified depends on the version. (4.3BSD Reno and later can notify the application that the datagram was truncated.) The sockets API under SVR4 (including Solaris 2.x) does not truncate the datagram. Any excess data is returned in subsequent reads. The application is not notified that multiple reads are being fulfilled from a single UDP datagram.
The TLI API does not discard the data. Instead a flag is returned indicating that more data is available, and subsequent reads by the application return the rest of the datagram.
When we discuss TCP we'll see that it provides a continuous stream of bytes to the application, without
any message boundaries. TCP passes the data to the application in whatever size reads the application asks
for-there is never any data loss across this interface.
ICMP Source Quench Error
Using
UDP we are also able to generate the ICMP "source quench" error. This
is an error that may be generated by a system (router or host) when it
receives datagrams at a rate that is too fast to be processed. Note the
qualifier "may." A system is not required to send a source quench, even
if it runs out of buffers and throws datagrams away.
Although RFC
1009 [Braden and Postel 1987] requires a router to generate source
quenches when it runs out of buffers, the new Router Requirements RFC
[Almquist 1993] changes this and says that a router must not originate
source quench errors. The current feeling is to deprecate the source
quench error, since it consumes network bandwidth and is an ineffective
and unfair fix for congestion.
UDP Server Design
There
are some implications in using UDP that affect the design and
implementation of a server. The design and implementation of clients is
usually easier than that of servers, which is why we talk about server
design and not client design. Servers typically interact with the
operating system and most servers need a way to handle multiple clients
at the same time.
Normally a client starts,
immediately communicates with a single server, and is done. Servers, on
the other hand, start and then go to sleep, waiting for a client's
request to arrive. In the case of UDP, the server wakes up when a
client's datagram arrives, probably containing a request message of some
form from the client.
Client IP Address and Port Number
What arrives from the client is a UDP datagram. The IP header contains the source and destination IP addresses, and the UDP header contains the source and destination UDP port numbers. When an application receives a UDP datagram, it must be told by the operating system who sent the message-the source IP address and port number.
This
feature allows an iterative UDP server to handle multiple clients. Each
reply is sent back to the client that sent the request.
Destination IP Address
Some
applications need to know who the datagram was sent to, that is, the
destination IP address. For example, the Host Requirements RFC states
that a TFTP server should ignore received datagrams that are sent to a
broadcast address.
This requires the operating system to pass
the destination IP address from the received UDP datagram to the
application. Unfortunately, not all implementations provide this
capability.
The sockets API provides this
capability with the IP_RECVDSTADDR socket option. Of the systems used in
the text, only BSD/386, 4.4BSD, and AIX 3.2.2 support this option.
SVR4, SunOS 4.x, and Solaris 2.x don't support it.
UDP Input Queue
most
UDP servers are iterative servers. This means a single server process
handles all the client requests on a single UDP port (the server's
well-known port).
Normally there is a limited size input queue
associated with each UDP port that a application is using. This means
that requests that arrive at about the same time from different clients
are automatically queued by UDP. The received UDP datagrams are passed
to the application (when it asks for the next one) in the order they
were received. It is possible, however, for this queue to overflow,
causing the kernel's UDP module to discard incoming datagrams.
First,
the application is not told when its input queue overflows. The excess
datagrams are just discarded by UDP. Also, from the tcpdump output we
see that nothing is sent back to the client to tell it that its datagram
was discarded. There is nothing like an ICMP source quench sent back to
the sender. Finally, it appears that the UDP input queue is FIFO
(first-in, first-out), whereas we saw that the ARP input queue was LIFO
(last-in, first-out).
Restricting Local IP Address
Most
UDP servers wildcard their local IP address when they create a UDP end
point. This means that an incoming UDP datagram destined for the
server's port will be accepted on any local interface.
When the
server creates its end point it can specify one of the host's local IP
addresses, including one of its broadcast addresses, as the local IP
address for the end point. Incoming UDP datagrams will then be passed to
this end point only if the destination IP address matches the specified
local address.
There is a priority implied when an end point
with a wildcard address exists. An end point with a specific IP address
that matches the destination IP address is always chosen over a
wildcard. The wildcard end point is used only when a specific match is
not found.
Restricting Foreign IP Address
Most
implementations allow a UDP end point to restrict the foreign address.
This means the end point will only receive UDP datagrams from that
specific IP address and port number.
There is a side effect of
specifying the foreign IP address and foreign port on Berkeley-derived
systems: if the local address has not been chosen when the foreign
address is specified, the local address is chosen automatically. Its
value becomes the IP address of the interface chosen by IP routing to
reach the specified foreign IP address.
Multiple Recipients per Port
Although
it's not specified in the RFCs, most implementations allow only one
application end point at a time to be associated with any one local IP
address and UDP port number. When a UDP datagram arrives at a host
destined for that IP address and port number, one copy is delivered to
that single end point. The IP address of the end point can be the
wildcard.
On systems that support
multicasting (Chapter 12), this changes. Multiple end points can use the
same local IP address and UDP port number, although the application
normally must tell the API that this is OK (i.e., our -A flag to specify
the SO_REUSEADDR socket option).
4.4BSD, which supports
multicasting, requires the application to set a different socket option
(SO_REUSEPORT) to allow multiple end points to share the same port.
Furthermore each end point must specify this option, including the first
one to use the port.
When a UDP datagram
arrives whose destination IP address is a broadcast or multicast
address, and there are multiple end points at the destination IP address
and port number, one copy of the incoming datagram is passed to each
end point. (The end point's local IP address can be the wildcard, which
matches any destination IP address.) But if a UDP datagram arrives whose
destination IP address is a unicast address, only a single copy of the
datagram is delivered to one of the end points. Which end point gets the
unicast datagram is implementation dependent.