Chinaunix首页 | 论坛 | 博客
  • 博客访问: 51808
  • 博文数量: 13
  • 博客积分: 1465
  • 博客等级: 上尉
  • 技术积分: 130
  • 用 户 组: 普通用户
  • 注册时间: 2006-06-01 10:15
文章分类

全部博文(13)

文章存档

2011年(2)

2008年(11)

我的朋友

分类: Sybase

2008-11-10 11:19:15

Preface

I'd like to talk about the difference between a clustered and a non-clustered index. The two are very different and it's very important to understand the difference between the two to in order to know when and how to use each.

I've pondered hard to find the best analogy that I could think of and I've come up with ... the phone book. Yes, a phone book.

Imagine that each page in our phone book is equivalent to a Sybase 2K data page. Every time we read a page from our phone book it is equivalent to one disk I/O.

Since we are imagining, let's also imagine that our mythical ASE (that runs against the phone book) has only enough data cache to buffer 200 phone pages. When our data cache gets full we have to flush an old page out so we can read in a new one.

Fasten your seat belts, because here we go...

Clustered Index

A phone book lists everyone by last name. We have an A section, we have a B section and so forth. Within each section my phone book is clever enough to list the starting and ending names for the given page.

The phone book is clustered by last name.

create clustered index on phone_book (last_name)

It's fast to perform the following queries on the phone book:

  • Find the address of those whose last name is Cisar.
  • Find the address of those whose last name is between Even and Fa

Searches that don't work well:

  • Find the address of those whose phone number is 440-1300.
  • Find the address of those whose prefix is 440

In order to determine the answer to the two above we'd have to search the entire phone book. We can call that a table scan.

Non-Clustered Index

To help us solve the problem above we can build a non-clustered index.

create nonclustered index on phone_book (phone_number)

Our non-clustered index will be built and maintained by our Mythical ASE as follows:

  1. Create a data structure that will house a phone_number and information where the phone_number exists in the phone book: page number and the row within the page.

    The phone numbers will be kept in ascending order.

  2. Scan the entire phone book and add an entry to our data structure above for each phone number found.
  3. For each phone number found, note along side it the page number that it was located and which row it was in.

any time we insert, update or delete new numbers, our M-ASE will maintain this secondary data structure. It's such a nice Server.

Now when we ask the question:

Find the address of those whose phone number is 440-1300

we don't look at the phone book directly but go to our new data structure and it tells us which page and row within the page the above phone number can be found. Neat eh?

Draw backs? Well, yes. Because we probably still can't answer the question:

Find the address of those whose prefix is 440

This is because of the data structure being used to implement non-clustered indexes. The structure is a list of ordered values (phone numbers) which point to the actual data in the phone book. This indirectness can lead to trouble when a range or a match query is issued.

The structure may look like this:

------------------------------------
|Phone Number   |  Page Number/Row |
====================================
| 440-0000      |  300/23          |
| 440-0001      |  973/45          |
| 440-0002      |   23/2           |
| ...           |                  |
| 440-0030      |  973/45          |
| 440-0031      |  553/23          |
| ...           |                  |
------------------------------------ 

As one can see, certain phone numbers may map to the same page. This makes sense, but we need to consider one of our constraints: our Server only has room for 200 phone pages.

What may happen is that we re-read the same phone page many times. This isn't a problem if the phone page is in memory. We have limited memory, however, and we may have to flush our memory to make room for other phone pages. So the re-reading may actually be a disk I/O.

The Server needs to decide when it's best to do a table scan versus using the non-clustered index to satisfy mini-range type of queries. The way it decides this is by applying a heuristic based on the information maintained when an update statistics is performed.

In summary, non-clustered indexes work really well when used for highly selective queries and they may work for short, range type of queries.

Suggested Uses

Having suffered many table corruption situations (with 150 ASEs who wouldn't? :-)), I'd say always have a clustered index. With a clustered index you can fish data out around the bad spots on the table thus having minimal data loss.

When you cluster, build the cluster to satisfy the largest percentage of range type queries. Don't put the clustered index on your primary key because typically primary keys are increasing linearly. What happens is that you end up inserting all new rows at the end of the table thus creating a hot spot on the last data page.

For detail rows, create the clustered index on the commonly accessed foreign key. This will aid joins from the master to it.

Use nonclustered index to aid queries where your selection is very selective. For example, primary keys. :-)

阅读(1347) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~