Chinaunix首页 | 论坛 | 博客
  • 博客访问: 105861
  • 博文数量: 11
  • 博客积分: 2000
  • 博客等级: 大尉
  • 技术积分: 260
  • 用 户 组: 普通用户
  • 注册时间: 2007-03-31 21:46
文章分类

全部博文(11)

文章存档

2011年(1)

2008年(10)

我的朋友

分类: LINUX

2008-05-05 10:43:54

Red-Black Trees

Contents:





The information on this page is a drawn in part from Introduction to Algorithms, second edition, by Cormen, Leiserson, Rivest and Stein (CLRS); Purely Functional Data Structures, by Chris Okasaki; ; Chee Yap's lecture notes, which were on the web, but the link is now broken; and comments from the students in my Algorithms class.

Red-Black Tree Definition

As defined in CLRS, a red-black tree is a binary search tree (BST) that satisfies the following properties:

  1. Every node is either red or black.
  2. The root is black.
  3. Every leaf (NIL) is black.
  4. If a node is red, then both its children are black. (Or in other words, two red nodes may not be adjacent.)
  5. For each node, all paths from the node to descendant leaves contain the same number of black nodes.
Data items are only stored at internal nodes. What we are calling a "leaf" would probably be represented as a null pointer in the parent node. It helps in describing the insertion and deletion algorithms to think of these as actual nodes, colored black.

Theorem: The height h of a red-black tree with n internal nodes is no greater than 2log(n+1).
Proof: By property 5, every root-to-leaf path in the tree has the same number of black nodes; let this number be B. So there are no leaves in this tree at depth less than B, which means the tree has at least as many internal nodes as a complete binary tree of height B. Therefore, n≥2B-1. This implies B≤log(n+1). By property 4, at most every other node on a root-to-leaf path is red. Therefore, h≤2B. Putting these together, we have h≤2log(n+1). Q.E.D.

The remainder of this page shows how insertion and deletion can be done without violating the above properties, and in time proportional to the height of the tree, or O(log n). Parenthetical remarks relate this description to that in CLRS. If you don't have CLRS you can ignore these.

Okasaki's Insertion Method

A node is inserted the same way as it would for a BST, and colored red. (The children of the new node will be leaves, which are by definition black.) At that point either property 2 (root is black) or property 4 (no two adjacent red nodes) might be violated.

If the inserted node is the root (meaning the tree was previously empty) then we simply change its color to black and we're done.

If property 4 is violated, it must be because the new node's parent is also red. Since the root must be black, the new node must also have a grandparent in the tree, and by property 4 it must be black. There are four possibilities for how this subtree can be structured, vis a vis the new node, its parent and its grandparent, as shown in the diagram below. In Okasaki's method, each possible subtree is transformed into the subtree shown in the middle of the diagram.

(A, B, C and D refer to arbitrary subtrees. We have said that the children of the new node must both be leaves, but we will see in a moment that the diagram applies more generally).

First, notice that the ordering <AxByCzD> is preserved by these transformations.

Second, notice that these transformations do not change the number of black nodes on any path from the parent of this subtree (if one exists) to the leaves. We are again in the position where the only properties that can be violated are property 2 (if y is the root) and property 4 (if y's parent is red). But the advantage is that we are now two steps closer to the root. We can repeat this operation until either y's parent is black, in which case we're done, or y is the root, in which case we color y black and we're done. (Coloring the root black increases the number of black nodes on every path from the root to the leaves by the same amount, so property 5 will not be violated after that re-coloring if it wasn't violated before.)

This procedure preserves the red-black properties, and it takes time proportional to the height of the tree, which is O(log n).

Rotations

Restructuring operations on red-black trees (and many other kinds of trees) can often be expressed more clearly in terms of the rotation operation, diagrammed below.

Clearly the order <AxByC> is preserved by the rotation operation. Therefore, if we start with a BST, and only restructure using rotations, then we will still have a BST. In the remainder of this tutorial, we will only restructure with rotations, and so we won't need to say anything more about the proper ordering of the elements.

In the diagram below, the transformations in Okasaki's insertion method are shown as one or two rotations. For comparison, is the previous diagram. It makes a nice visualization to put the two diagrams in about the same place on your screen, and go back and forth between them using your browser's forward and back buttons.

The CLRS Insertion Method

CLRS gives an insertion method that is more complicated than Okasaki's, but slightly more efficient. It's time complexity is still O(log n), but the constant embedded in the big-Oh is smaller. You can safely skip this section and go to .

As with Okasaki's method, this method starts with standard BST insertion, and colors the new node red. It differs in how it handles violations of property 4 (no two red nodes adjacent). We distinguish two cases, based on the color of the uncle of the bottom red node. (The bottom red node is the child node in the red-parent/red-child pair. Its uncle is the red-parent's sibling.) First, let's consider the case where the uncle is black. There are four subcases, depending on whether each red node is a left or right child. The diagram below shows how the tree should be restructured and re-colored. (This is cases 2 and 3 in CLRS.)

It is interesting to see how this diagram compares with . (Again, it is interesting to go back and forth between these two images using your browser's forward and back buttons.) There are two differences. The first is in how the final subtree (in the middle) is colored. In Okasaki's method, the root y of the subtree is red and its children are black, while here y is black and its children are red. Making y black means that property 4, no two adjacent red nodes, will not be violated at y, so the procedure does not need to continue up the tree toward the root. The CLRS insertion method never requires more than two rotations.

The second difference is that in the CLRS method this case has a precondition that the uncle of the bottom red node is black. It can clearly be seen in the diagram that if the uncle (which is the root of either subtree A or D) were red, then there would be two adjacent red nodes in the final tree, and so this method could not be used.

It is straightforward to verify that the red-black properties are satisfied in the final tree, if the only violation initially is the adjacency of the two red nodes shown.

It remains to consider the case where the uncle of the bottom red node is red. (This is case 1 in CLRS.) In that case the top red node and its sibling (the uncle) are turned black, and their parent is turned red. The tree is not restructured. There are four cases, depending on whether the bottom red node is a left or right child, and whether the top red node is a left or right child, but they are all essentially the same. One case is pictured here:


It can easily be seen that the number of black nodes on a path from the root of the tree to the leaves is not changed by this operation. Afterwards, the only violation of the red-black properties would be if the root of the subtree is the root of the tree, or if it has a red parent. In other words, we might have to start over, but two steps closer to the root. So, the procedure is repeated until either (i) the parent of z is black, in which case we're done; (ii) z is the root, in which case we color it black and we're done; or (iii) we encounter the black-uncle case, in which case we do one or two rotations and we're done. In the worst case we will have to re-color nodes all the way up to the root, requiring O(log n) operations.

Deletion

To delete an item from a red-black tree, we start by performing the standard BST delete operation (see CLRS, chapter 12). To review, standard BST deletion has three cases:

  1. The node to be removed has no children. In this case we simply remove it. If it is the root, then the tree becomes empty; otherwise its parent's appropriate child pointer is set to null.
  2. The node to be removed has one child. Again, we simply remove it. If it is the root, then its child becomes the root; otherwise, its parent's appropriate child pointer is set to point to the removed node's child.
  3. The node to be removed has two children. In this case we find the node's successor, which is the minimum node in its right subtree. The data elements in these two nodes are swapped, and then we remove the successor node. Since the successor node cannot have a left child, one of the first two cases will apply.

The node that is removed from the tree might not be the node that originally contained the deleted data item. For the purposes of re-establishing the red-black properties we are only concerned with the node that is removed. Call this node v, and let its parent be p(v).

At least one of v's children must be a leaf. If v has a non-leaf child, its place in the structure of the tree is taken over by that child; otherwise its place is taken over by a leaf. Let u be the child that replaces v in the tree structure after BST deletion. If u is a leaf, then we know it is black.

If v is red, we are done - no violations of the red-black properties will be introduced. So assume v is black. All paths from the root to leaves that descended from v will have fewer black nodes than other root-to-leaf paths in the tree, which is a violation of property 5. If p(v) and u are both red then property 4 will also be violated, but it will turn out that our solution to the violation of property 5 will also solve the property 4 violation without doing any more work, so for now we will focus on the property 5 problem.

Let's imagine assigning a black token to u. The token indicates that the simple paths to leaves descending from the node holding the token are short a black node (initially that is because v was removed). We will move the token up the tree until we encounter a situation where property 5 may be restored. The token is represented as a black square in the diagrams below. If the node with the token is black, then we call it a doubly black node.

Note that the token is a conceptual device, and is not physically implemented in the tree data structure.

We distinguish four mutually exclusive cases.

  1. If the node with the token is red or the root of the tree (or both), simply set its color to black and we're done. Note that this removes any violation of property 4 (no two red nodes adjacent). It also removes a violation of property 5, for the following reason. The token indicated that leaves descending from this node needed another black node on their paths to the root in order to match other root-to-leaf paths in the tree. By changing this red node to black, we add one more black node on precisely those paths that were short a black node.

    If the token is at the root and the root is black, the token is simply dropped. In this case, every root-to-leaf path in the tree has had it's number of black nodes reduced by one, and property 5 still holds.

    For the remaining cases, we can assume that the node with the token is black, and is not the root.

  2. If the sibling and both nephews of the doubly black node are also black, then we change the sibling to red and move the token up one step towards the root. (This is case 2 in CLRS.)

    In the diagram below, which shows the two possible subcases, the dashed line around y indicates that we don't care at this point what color y is, and the small circles above A, B, C or D indicate that the root of that subtree is black.

    Changing the sibling to red removes one black node from the paths to its descendant leaves, so that those paths now have the same number of black nodes as those descending from the doubly black node. We move the token up to the parent y to indicate that now all paths descending from y are short a black node. We haven't solved the problem, but we have pushed it up towards the root one step.

    Clearly this operation can only be done if both nephews are black, since otherwise we would end up with adjacent red nodes.

  3. If the sibling of the doubly black node is red, then we perform a rotation and change colors. (This is CLRS case 1.) The two possibilities are shown in the following diagram:

    This step doesn't change the number of black nodes on the path from the root to any leaf, but it does guarantee that the doubly black node's sibling is black, which enables either case B or case D in the next step.

    It might seem that we are going backwards, since the token is now farther from the root than it was before. But the parent of the doubly black node is now red, so if the next step is case B, the token will be passed up to a red node, which will then turn black and we will be done. Furthermore, as shown below, case D always consumes the token and finishes the operation. So this apparent step backward is actually a sign that we are almost done.

  4. Finally, we have the case where the doubly black node has a black sibling and at least one red nephew. Define the near nephew of a node x to be the left child of x's sibling if x is a left child, and the right child of x's sibling if x is a right child; define x's far nephew to be its other nephew. (In the diagram, x's near nephew is closer to x than its far nephew.)

    We now have two subcases: (i) the doubly black node's far nephew is black, so its near nephew must be red; and (ii) the far nephew is red, so the near nephew can be either color. (This is cases 3 and 4 in CLRS.) As shown in the following diagram, subcase (i) is transformed to subcase (ii) by a rotation and change of colors, and subcase (ii) is solved by another rotation and change of colors. The two rows are symmetrical, depending on whether the doubly black node is a left or right child.

    In this case we generate an extra black node, the token is dropped, and we are done. It can easily be verified by looking at the diagram that the number of black nodes on paths to leaves below the token is increased by one, while the number of black nodes on other paths is unchanged. And it is also clear that none of the other red-black properties is violated.

Putting all of these cases together, we can see that in the worst case we are following a path from leaf to root, and performing a constant number of operations on each step, so the time complexity for deletion is O(log n).

Exercise

Starting with an empty red-black tree, successively insert the following values: 260, 240, 140, 320, 250, 170, 60, 100, 20, 40, 290, 280, 270, 30, 265, 275, 277, and 278. If you use Okasaki's insertion method, and if I haven't made any mistakes, you should end up with this tree:


 
阅读(1167) | 评论(0) | 转发(0) |
0

上一篇:linux 2.6.25 changelog

下一篇:IA-32

给主人留下些什么吧!~~