分类: BSD
2007-08-23 13:56:46
There is a whole chain of things that act on the basic request. The resolver library creates the DNS query and decodes the answer for the application. There are the master and slave nameservers who have the permanent answers to the questions, and there may be several intermediate nameservers who could have the answer in cache or need to pass the request on.
Meanwhile, at the application level, many applications are not designed to handle a set of answers to a query. In the case of name to address translation, it is common practice to simply pick the first one in the list and try it as if it was the only one in the list. For most applications, all that comes back is the IP address, and all the other information that DNS had is lost.
In the early version of the nameserver, the order of RRs in an RRset was fixed. This meant that even though you may have several RRs, by the way all the pieces in the DNS chain worked, no one would ever used anything but the first entry. It was realized that since the spec said there was no order in an RRset, it would break nothing to shuffle the order. The simplest way to do this was to roll the order each time you answered a query for this RRset. Since almost all the issues are for A records, unless otherwise noted we will assume host address lookups.
This is the current "round robin" function, and it has been used quite successfully for several years in sharing load among several machines. But this does not meet everyone's needs. People want to do other things beyond the simple even distribution. Some people want to detect and remove dead servers from the list (if 1 of 4 servers dies, 25% of the connections will fail.) Others want to dymanically change the rate at which various servers are on top of the list to spread work better rather than just connections. Others are interested more in minimizing delay to the user, and want to sort the list to get the closest machine on top.
So which ordering mechanism do we choose to implement? Given that the spec says that the data is not ordered, and there are different ideas of what kind of reordering is needed, we see a situation where this becomes an area where we can never do it "right". Whatever we choose, someone else will want something different.
No matter what type of better ordering you want, there needs to be some form of external preference. The A records have no place to put a preference (this presents an obvious solution that we'll skip for the moment.) So you have an external file or automated query hook to get this information. But who can do this? The master can certainly do this, but how do the slaves get this from the master? It would either have to be out of band from the normal zone transfers, or you would have to use an existing RR type and a convention to pass it. Even that can't work with caching nameservers, they only know about standard DNS rules and will round robin anything they cache. To make life worse, the applications do not typically receive the TTL timer on the RR when it gets an answer, so it may hold onto an answer longer than the TTL and not operate correctly or make more DNS requests than are necessary which can confuse the distribution.
So a fundamental problem is how to get this extra information into BIND and how do you pass it around? This would depend on the type of ordering done, the dynamic nature of the data and the degree of control of the nameservers involved. Again, we see ourselves in a no-win situation.
This possibility was considered, but it had many difficulties and was discarded. The first was a concern about performance. The root nameservers are taking 1200 queries per second and climbing, the extra code path and possible context switch could wipe out heavy use machines.
The second and more fundamental reason was the response form issue. The response only has two forms inside BIND, the database query form and the wire form. Both these are complex. To handle this, the child process would need to decode the information, reorder it, reencode it. Then BIND would need to decode what it got back and verify what it got back was valid to send on.
The final issue is one of error handling. What should BIND do if the child process returns a bad response? What if the child dies (or fails to start on a fork)? BIND was designed as a single entity and adding ancillary processing would cause a major code structure problem for BIND.
The first recommendation is to stop using A records for things you want to reorder. There was a new type of RR created called a SRV record. It adds a weight integer to the normal fields, which is specifically designed to describe the ordering. This has become a chicken and egg situation for deployment. Netscape and Internet Explorer don't want to implement this until hosts support it and people use it. People don't want to use it unless Netscape and IE implement it.
The second recommendation is to use a different nameserver to handle zones with ordered RRsets. To do this for you would have be a CNAME for and then delegate the reorder.foo.com zone to a nameserver like . This allows any type of reordering in a well partitioned way.
The final solution would be to have a user program accept all the DNS queries for the machine, handle all the load balanced queries directly. The rest would be passed to BIND running on another port and the response forwarded back to the query sender. In either of the last two solutions, it is required that the other program understand DNS wire format, but that was true of the callout case as well.