分类: 系统运维
2012-06-15 13:16:52
From
an application's point of view, access to the DNS is through a
resolver. On Unix hosts the resolver is accessed primarily through two
library functions, gethostbyname(3) and gethostbyaddr(3), which are
linked with the application when the application is built. The first
takes a hostname and returns an IP address, and the second takes an IP
address and looks up a hostname. The resolver contacts one or more name
servers to do the mapping. The resolver is normally part of the
application. It is not part of the operating system kernel as are the
TCP/IP protocols. Another fundamental point is that an application must
convert a hostname to an IP address before it can ask TCP to open a
connection or send a datagram using UDP. The TCP/IP protocols within the
kernel know nothing about the DNS.
DNS Basics
The DNS name space is hierarchical, similar to the Unix filesystem.
Every
node has a label of up to 63 characters. The root of the tree is a
special node with a null label. Any comparison of labels considers
uppercase and lowercase characters the same. The domain name of any node
in the tree is the list of labels, starting at that node, working up to
the root, using a period ("dot") to separate the labels. (Note that
this is different from the Unix filesystem, which forms a pathname by
starting at the top and going down the tree.) Every node in the tree
must have a unique domain name, but the same label can be used at
different points in the tree.
A domain name that ends with a
period is called an absolute domain name or a fully qualified domain
name (FQDN). An example is sun.tuc.noao.edu.. If the domain name does
not end with a period, it is assumed that the name needs to be
completed. How the name is completed depends on the DNS software being
used. If the uncompleted name consists of two or more labels, it might
be considered to be complete; otherwise a local addition might be added
to the right of the name. For example, the name sun might be completed
by adding the local suffix .tuc.noao.edu.. The top-level domains are
divided into three areas:
1. arpa is a special domain used for address-to-name mappings.
2. The seven 3-character domains are called the generic domains. Some texts call these the organizational domains.
3.
All the 2-character domains are based on the country codes found in ISO
3166. These are called the country domains, or the geographical
domains.
The normal classification of the seven generic domains.
com: commercial organizations
edu: educational institutions
gov: other U.S. governmental organizations
int: international organizations
mil: U.S. military
net: networks
org: other organizations
DNS
folklore says that the 3-character generic domains are only for U.S.
organizations, and the 2-character country domains for everyone else,
but this is false. There are many non-U.S. organizations in the generic
domains, and many U.S. organizations in the .us country domain. (RFC
1480 [Cooper and Postel 1993] describes the .us domain in more detail.)
The only generic domains that are restricted to the United States are
.gov and .mil.
Many countries form
second-level domains beneath their 2-character country code similar to
the generic domains: .ac.uk, for example, is for academic institutions
in the United Kingdom and .co.uk is for commercial organizations in the
United Kingdom.
No single entity manages every label in the tree.
Instead, one entity (the NIC) maintains a portion of the tree (the
top-level domains) and delegates responsibility to others for specific
zones.
A zone is a subtree of the DNS
tree that is administered separately. A common zone is a second-level
domain, noao.edu, for example. Many second-level domains then divide
their zone into smaller zones. For example, a university might divide
itself into zones based on departments, and a company might divide
itself into zones based on branch offices or internal divisions.
If
you are familiar with the Unix filesystem, notice that the division of
the DNS tree into zones is similar to the division of a logical Unix
filesystem into physical disk partitions. Just as we can't tell from
where the zones of authority lie, we can't tell from a similar picture
of a Unix filesystem which directories are on which disk partitions.
Once
the authority for a zone is delegated, it is up to the person
responsible for the zone to provide multiple name servers for that zone.
Whenever a new system is installed in a zone, the DNS administrator
for the zone allocates a name and an IP address for the new system and
enters these into the name server's database. This is where the need
for delegation becomes obvious. At a small university, for example, one
person could do this each time a new system was added, but in a large
university the responsibility would have to be delegated (probably by
departments), since one person couldn't keep up with the work.
A
name server is said to have authority for one zone or multiple zones.
The person responsible for a zone must provide a primary name server for
that zone and one or more secondary name servers. The primary and
secondaries must be independent and redundant servers so that
availability of name service for the zone isn't affected by a single
point of failure.
The main difference
between a primary and secondary is that the primary loads all the
information for the zone from disk files, while the secondaries obtain
all the information from the primary. When a secondary obtains the
information from its primary we call this a zone transfer.
When a
new host is added to a zone, the administrator adds the appropriate
information (name and IP address minimally) to a disk file on the system
running the primary. The primary name server is then notified to reread
its configuration files. The secondaries query the primary on a regular
basis (normally every 3 hours) and if the primary contains newer data,
the secondary obtains the new data using a zone transfer.
What
does a name server do when it doesn't contain the information requested?
It must contact another name server. (This is the distributed nature of
the DNS.) Not every name server, however, knows how to contact every
other name server. Instead every name server must know how to contact
the root name servers. As of April 1993 there were eight root servers
and all the primary servers must know the IP address of each root
server. (These IP addresses are contained in the primary's configuration
files. The primary servers must know the IP addresses of the root
servers, not their DNS names.) The root servers then know the name and
location (i.e., the IP address) of each authoritative name server for
all the second-leveldomains. This implies an iterative process: the
requesting name server must contact a root server. The root server tells
the requesting server to contact another server, and so on.
You
can fetch the current list of root servers using anonymous FTP. Obtain
the file netinfo/root-servers.txt from either ftp.rs.internic.net or
nic.ddn.mil.
A fundamental property of
the DNS is caching. That is, when a name server receives information
about a mapping (say, the IP address of a hostname) it caches that
information so that a later query for the same mapping can use the
cached result and not result in additional queries to other servers.
DNS Message Format
There is one DNS message defined for both queries and responses.
0~15 | 16~31 |
identification | flags |
number of questions | number of answer RRs |
number of authority RRs | number of additional RRs |
questions | |
answers (variable number of resource records) | |
authority (variable number of resource records) | |
additional information (variable number of resource records) |
The identification is set by the client and returned by the server. It lets the client match responses to requests.
The 16-bit flags field is divided into numerous pieces:
QR | opcode | AA | TC | RD | RA | (zero) | rcode |
1 | 4 | 1 | 1 | 1 | 1 | 3 | 4 |
1. QR is a 1-bit field: 0 means the message is a query, 1 means it's a response.
2.
opcode is a 4-bit field. The normal value is 0 (a standard query).
Other values are 1 (an inverse query) and 2 (server status request).
3.
AA is a 1-bit flag that means "authoritative answer." The name server
is authoritative for the domain in the question section.
4.
TC is a 1-bit field that means "truncated." With UDP this means the
total size of the reply exceeded 512 bytes, and only the first 512 bytes
of the reply was returned.
5. RD is a
1-bit field that means "recursion desired." This bit can be set in a
query and is then returned in the response. This flag tells the name
server to handle the query itself, called a recursive query. If the bit
is not set, and the requested name server doesn't have an authoritative
answer, the requested name server returns a list of other name servers
to contact for the answer. This is called an iterative query.
6.
RA is a 1-bit field that means "recursion available." This bit is set
to 1 in the response if the server supports recursion. Most name servers
provide recursion, except for some root servers.
7. There is a 3-bit field that must be 0.
8.
rcode is a 4-bit field with the return code. The common values are 0
(no error) and 3 (name error). A name error is returned only from an
authoritative name server and means the domain name specified in the
query does not exist. The next four 16-bit fields specify the number of
entries in the four variable-length fields that complete the record. For
a query, the number of questions is normally 1 and the other three
counts are 0. Similarly, for a reply the number of answers
is at least 1, and the remaining two counts can be 0 or nonzero.
Question Portion of DNS Query Message
The format of each question in the question section:
0~15 | 16~31 |
query name | |
query type | query class |
Unlike many other message formats that we've encountered, this field is allowed to end on a boundary other than a 32-bit boundary.
Each question has a query type and each response (called a resource record) has a type. There are about 20 different values, some of which are now obsolete. The query type is a superset of the type: two of the values we show can be used only in questions.
Name | Numeric value | Description | type? | query type? |
A NS CNAME PTR HINFO MX | 1 2 5 12 13 15 | IP address name server canonical name pointer record host info mail exchange record | * * * * * * | * * * * * * |
AXFR * or ANY | 252 255 | request for zone transfer request for all records | * * |
0~15 | 16~31 |
domain name | |
type | class |
time to live | |
resource data length | resource data |
resource data |
The domain name is the name to which the following
resource data corresponds. It is in the same format as we described
earlier for the query name field.
The type specifies one of the
RR type codes. These are the same as the query type values that we
described earlier. The class is normally 1 for Internet data.
The time-to-live field is the number of seconds that the RR can be cached by the client. RRs often have a TTL of 2 days.
The resource data length specifies the amount of
resource data. The format of this data depends on the type. For a type
of 1 (an A record) the resource data is a 4-byte IP address.
The file /etc/resolv.conf contains info like:
nameserver 140.252.1.54
domain tuc.noao.edu
The
first line gives the IP address of the name server - the host noao.edu.
Up to three nameserver lines can be specified, to provide backup in
case one is down or unreachable. The domain line specifies the default
domain. If the name being looked up is not a fully qualified domain name
(it doesn't end with a period) then the default domain .tuc.noao.edu is
appended to the name.
Pointer Queries
A perpetual stumbling block in
understanding the DNS is how pointer queries are handled - given an IP
address, return the name (or names) corresponding to that address.
When
an organization joins the Internet and obtains authority for a portion
of the DNS name space, such as noao.edu, they also obtain authority for a
portion of the in-addr.arpa name space corresponding to their IP
address on the Internet. In the case of noao.edu it is the class B
network ID 140.252. The level of the DNS tree beneath in-addr.arpa must
be the first byte of the IP address (140 in this example), the next
level is the next byte of the IP address (252), and so on. But remember
that names are written starting at the bottom of the DNS tree, working
upward. This means the DNS name for the host sun, with an IP address of
140.252.13.33, is 33.13.252.140. in-addr.arpa.
We have to write
the 4 bytes of the IP address backward because authority is delegated
based on network IDs: the first byte of a class A address, the first and
second bytes of a class B address, and the first, second, and third
bytes of a class C address.
If there was not a separate branch
of the DNS tree for handling this address-to-name translation, there
would be no way to do the reverse translation other than starting at the
root of the tree and trying every top-level domain. This could
literally take days or weeks, given the current size of the Internet.
The in-addr.arpa solution is a clever one, although the reversed bytes
of the IP address and the special domain are confusing.
Resource Records
We've
seen a few different types of resource records (RRs) so far: an IP
address has a type of A, and PTR means a pointer query. We've also seen
that RRs are what a name server returns: answer RRs, authority RRs, and
additional information RRs. There are about 20 different types of
resource records. Also, more RR types are being added over time.
Caching
To
reduce the DNS traffic on the Internet, all name servers employ a
cache. With the standard Unix implementation, the cache is maintained in
the server, not the resolver. Since the resolver is part of each
application, and applications come and go, putting the cache into the
program that lives the entire time the system is up (the name server)
makes sense. This makes the cache available to any applications that use
the server. Any other hosts at the site that use this name server also
share the server's cache.
UDP or TCP
We've
mentioned that the well-known port numbers for DNS name servers are UDP
port 53 and TCP port 53. This implies that the DNS supports both UDP
and TCP. But all the examples that we've watched with tcpdump have used
UDP. When is each protocol used and why?
When the resolver issues a query and the response
comes back with the TC bit set ("truncated") it means the size of the
response exceeded 512 bytes, so only the first 512 bytes were returned
by the server. The resolver normally issues the request again, using
TCP. This allows more than 512 bytes to be returned.
Since TCP
breaks up a stream of user data into what it calls segments, it can
transfer any amount of user data, using multiple segments.
Also, when a secondary name server for a domain starts
up it performs a zone transfer from the primary name server for the
domain. We also said that the secondary queries the primary on a regular
basis (often every 3 hours) to see if the primary has had its tables
updated, and if so, a zone transfer is performed.
Zone transfers are done using TCP, since there is much
more data to transfer than a single query or response. Since the DNS
primarily uses UDP, both the resolver and the name server must perform
their own timeout and retransmission. Also, unlike many other Internet
applications that used UDP (TFTP, BOOTP, and SNMP), which operate mostly
on local area networks, DNS queries and responses often traverse wide
area networks. The packet loss rate and variability in round-trip times
are normally higher on a WAN than a LAN, increasing the importance of a
good retransmission and timeout algorithm for DNS clients.
Rlogin
We start an Rlogin client, connecting to an Rlogin server in some other domain.
The following 11 steps take place, assuming none of the information is already cached by the client or server:
1. The client starts and calls its resolver function to convert the hostname that we typed into an IP address. A query of type A is sent to a root server.
2. The root server's response contains the name servers for the server's domain.
3. The client's resolver reissues the query of type A to the server's name server. This query normally has the recursion-desired flag set.
4. The response comes back with the IP address of the server host.
5. The Rlogin client establishes a TCP connection with the Rlogin server. Three packets are exchanged between the client and server TCP modules.
6. The Rlogin server receives the connection from the client and calls its resolver to obtain the name of the client host, given the IP address that the server receives from its TCP. This is a PTR query issued to a root name server. This root server can be different from the root server used by the client in step 1.
7. The root server's response contains the name servers for the client's in-addr.arpa domain.
8. The server's resolver reissues the PTR query to the client's name server.
9. The PTR response contains the FQDN of the client host.
10. The server's resolver issues a query of type A to the client's name server, asking for the IP addresses corresponding to the name returned in the previous step. This may be done automatically by the server's gethostbyaddr function, otherwise the Rlogin server does this step explicitly. Also, the client's name server is often the same as the client's in-addr.arpa name server, but this isn't required.
11. The response from the client's name server
contains the A records for the client host. The Rlogin server compares
the A records with the IP address from. the client's TCP connection
request.