tpop(2.1): Searching-maunix-ChinaUnix博客

mhtmaunix.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

maunix

博客访问： 229217
博文数量： 136
博客积分： 2919
博客等级：少校
技术积分： 1299
用户组：普通用户
注册时间： 2011-03-11 09:08

文章分类

全部博文（136）

TED（3）
HTDP习题答案（34）

Chapter17（1）

Chapter16（0）

Chapter15（0）

Chapter14（6）

Chapter13（0）

Chapter12（0）

Chapter11（0）

Chapter10（3）

Chapter9（7）

Chapter8（0）

Chapter7（0）

Chapter6（0）

Chapter5（0）

Chapter4（0）

Chapter3（5）

Chapter2（11）

Chapter1（0）
随机算法（4）
算法实现-java（4）
SICP（1）
Perl（1）
Function Program（3）
AgileDeveloper（2）

Refactoring（1）
Java Ant（3）
TDDevelopment（1）
JUnit Testing（5）
Unix ProgEnviron（9）

Shell Programmin（4）
Pragmatic Progra（1）
VIM Editor（1）
C Programming（28）
Research（1）
Others（10）
Computer Science（2）
Linux Debian（3）
Linux Emacs（2）
Hadoop Programmi（1）
Practice Program（16）

ch1（1）

ch2（11）
未分配的博文（1）

文章存档

2013年（1）

2011年（135）

我的朋友

相关博文

tpop(2.1): Searching

分类： C/C++

2011-03-15 08:53:39

2 Algorithms and Data Structures

In the end, only familiarity with the tools and techniques of the field will provide the right solution for a particular problem, and only a certain amount of experience will provide consistently professional results.

Raymond Fielding. The Technique of Special Effects Cinematography

The study of algorithms and data structures is one of the foundations

of computer science, a rich field of elegant techniques and sophisticated mathematical analyses.

A good algorithm or data structure might make it possible to solve a

probme in seconds that could otherwise take years.

If you are developing programs in a field that's new to you, you must find out what is already known, lest you wast your time doing poorly what

others have already done well.

Accordingly, for most programmers, the task is to know what appropriate algorithms and data structures are available and to understand how to

choose among alternatives.

There are only a handful of basic algorithms that show up in almost

every program — primarily searching and sorting — and even those after often included in libraries. Similarly, almost every data structure is derived from a few fundamental ones.

2.1 Searching

2.1.1 Sequential Search

Nothing beasts an array for storing static tabular data. Compile-time initialization makes it cheap and easy to construct such arrays. In a program to detet words that are used rather too much in bad prose, we can write

char *flab[] = {
"actually",
"just",
"quite",
"really",
NULL
};

The search routing needs to know how many elements are in the array. One way to tell it is to pass the length as an argument; another, used here, is to place a NULL marker at the end of the array:

/* lookup: sequential search for word in array */
int lookup(char *word, char *array[])
{
int i;
for (i = 0; array[i] != NULL; i++)
if (strcmp(word, array[i]) == 0)
return i;
return -1;
}

In C and C++, a parameter that is an array of strings can be declared as char *array[] or char **array. Although these forms are equivalent, the first makes it clearer how the parameter will be used.

This search algorithm is called sequential search because it looks at each element in turn to see if it's the desired one. When the amount of data is small, sequential search is fast enough.
Standard library functions like strchr and strstr search for the first instance of a given character or substring in a C or C++ string. The Java String class has an indexOfmethod. and the generic C++ find algorithms apply th most data types. If such a function exists for the data type you've got, use it.
Sequential search is easy but the amount of work is directly proportional to the amount of data to be searched; doubling the number of elements will double the time to search if the desired item is not present. This is a linear relationship—run-time is a linear function of data size—so this method is also known as linear search.

2.1.2 Binary Search

Here's an excerpt from an array of more realistic size from a program that parses HTML, which defines textual names for well over a hundred individual characters:

typedef struct Nameval Nameval;
struct Nameval
{
char *name;
int value;
};
/* HTML characters, e.g. AElig is ligature of A and E. */
/* Values are Unicode/ISO10646 encoding. */
Nameval htmlchars[] = {
"AElig", 0x00c6,
"Aacute", 0x00c1,
"Acirc", 0x00c2,
/* ...*/
"zeta", 0x03b6,
};

For a larger array like this, more efficient to usr binary search. The binary search algorithm is an orderly version of the way we look up words in a dictionary. Check the middle element. If that value is bigger than what we are looking for, look in the first half; otherwise, look in the second half. Repeat until the desired item is found or determined not to be present.
For binary search, the table must be sorted, as it is here (that's good style anyway; people find things faster in sorted tables too), and we must know how long the table is. TheNELEMS macro from Chapter I can help:

printf("The HTML table has %d words\n", NELEMS(htmlchars));

printf("The HTML table has %d words\n", NELEMS(htmlchars));

A binary search function for this table might look like this:

/* lookup: binary search for name in tab; return index */
int lookup(char *name, Nameval tab[], int ntab)
{
int low, high, mid, cmp;
low = 0;
high = ntab-1;
while (low <= high) {
mid = (low + high) / 2;
cmp = strcmp(name, tab[mid].name);
if (cmp < 0)
high = mid - 1;
else if (cmp > 0)
low = mid + 1;
else /* found match */
return mid;
}
return -1; /* no match */
}

Putting all this together, to search htmlchars we write

half = lookup("frac12", htmlchars, NELEMS(htmlchars));

to find the array index of the character 1/2.

Binary search eliminates half the data at each step. The number of steps is therefore proportional to the number os times we can divide n by 2 before we're left with a single element. Ignoring roundoff, this is $log2n$.
If we have 1000 items to search, linear search takes up to 1000 steps, while binary search takes about 10. The more items, the greater the advantage of binary search to sequential search.

Beyond some size of input (which varies with the implementation), binary search is faster than linear search.

阅读(351) | 评论(0) | 转发(0) |

上一篇：K&R(1): Arrays

下一篇：tpop(2.2): Sorting

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6