BIND and Load Balancing-atyu30-ChinaUnix博客

OpenBSD

首页　| 　博文目录　| 　关于我

atyu30

博客访问： 3365068
博文数量： 815
博客积分： 12898
博客等级：上将
技术积分： 7883
用户组：普通用户
注册时间： 2006-12-25 09:57

文章分类

全部博文（815）

YY（10）
RHEL（15）
OpenBSD（10）
Oracle（27）
Database（107）
Linux（76）
虚拟化（14）
OpenBSD_03（56）
Leisure（151）
LDAP（21）
OpenBSD_02（90）
OpenBSD_01（91）
Mail（60）
Script（76）
未分配的博文（11）

文章存档

2014年（1）

2011年（46）

2010年（192）

2009年（121）

2008年（70）

2007年（385）

我的朋友

Technical Background

DNS consists of a set of resource records (RRs) in a hierarchical structure. These resource records are of the form of a name, a type and data specific to the type. DNS allows for more than one RR to have the same name and type, and this is called a set (RRset). According to the spec for DNS, the order of RRs in a RRset is unknown. Each RR has a TTL value that says how long it is allowed to live in caches.

There is a whole chain of things that act on the basic request. The resolver library creates the DNS query and decodes the answer for the application. There are the master and slave nameservers who have the permanent answers to the questions, and there may be several intermediate nameservers who could have the answer in cache or need to pass the request on.

Meanwhile, at the application level, many applications are not designed to handle a set of answers to a query. In the case of name to address translation, it is common practice to simply pick the first one in the list and try it as if it was the only one in the list. For most applications, all that comes back is the IP address, and all the other information that DNS had is lost.

Load balancing today

In the early version of the nameserver, the order of RRs in an RRset was fixed. This meant that even though you may have several RRs, by the way all the pieces in the DNS chain worked, no one would ever used anything but the first entry. It was realized that since the spec said there was no order in an RRset, it would break nothing to shuffle the order. The simplest way to do this was to roll the order each time you answered a query for this RRset. Since almost all the issues are for A records, unless otherwise noted we will assume host address lookups.

This is the current "round robin" function, and it has been used quite successfully for several years in sharing load among several machines. But this does not meet everyone's needs. People want to do other things beyond the simple even distribution. Some people want to detect and remove dead servers from the list (if 1 of 4 servers dies, 25% of the connections will fail.) Others want to dymanically change the rate at which various servers are on top of the list to spread work better rather than just connections. Others are interested more in minimizing delay to the user, and want to sort the list to get the closest machine on top.

So which ordering mechanism do we choose to implement? Given that the spec says that the data is not ordered, and there are different ideas of what kind of reordering is needed, we see a situation where this becomes an area where we can never do it "right". Whatever we choose, someone else will want something different.

More information is needed

No matter what type of better ordering you want, there needs to be some form of external preference. The A records have no place to put a preference (this presents an obvious solution that we'll skip for the moment.) So you have an external file or automated query hook to get this information. But who can do this? The master can certainly do this, but how do the slaves get this from the master? It would either have to be out of band from the normal zone transfers, or you would have to use an existing RR type and a convention to pass it. Even that can't work with caching nameservers, they only know about standard DNS rules and will round robin anything they cache. To make life worse, the applications do not typically receive the TTL timer on the RR when it gets an answer, so it may hold onto an answer longer than the TTL and not operate correctly or make more DNS requests than are necessary which can confuse the distribution.

So a fundamental problem is how to get this extra information into BIND and how do you pass it around? This would depend on the type of ordering done, the dynamic nature of the data and the degree of control of the nameservers involved. Again, we see ourselves in a no-win situation.

Can we have a child process do the reordering?

This possibility was considered, but it had many difficulties and was discarded. The first was a concern about performance. The root nameservers are taking 1200 queries per second and climbing, the extra code path and possible context switch could wipe out heavy use machines.

The second and more fundamental reason was the response form issue. The response only has two forms inside BIND, the database query form and the wire form. Both these are complex. To handle this, the child process would need to decode the information, reorder it, reencode it. Then BIND would need to decode what it got back and verify what it got back was valid to send on.

The final issue is one of error handling. What should BIND do if the child process returns a bad response? What if the child dies (or fails to start on a fork)? BIND was designed as a single entity and adding ancillary processing would cause a major code structure problem for BIND.

Technical Background

Load balancing today

More information is needed

Can we have a child process do the reordering?

Recommended solutions: