内容摘要 pattern match
本文转自:
BM算法为了确定在不匹配的情况下最大的可移位距离而使用了两种启发,即“坏字符”启发和“好后缀”启发,两种启发均能导致最大为m的移动距离。但是,由于“好后缀”启发的预处理和计算过程都比较复杂,Horspol于1980年发表了改进与简化BM 算法的论文,即BoyerMooreHorspoo(BMH)算法。BMH算法在移动模式时仅考虑了“坏字符”策略。它首先比较文本指针所指字符和模式串的最后一个字符,如果相等再比较其余m一1个字符。无论文本中哪个字符造成了匹配失败,都将由文本中和模式串最后一个位置对应的字符来启发模式向右的移动。关于“坏字符”启发和“好尾缀”启发的对比,孙克雷的研究表明:“坏字符”启发在匹配过程中占主导地位的概率为94.O3 ,远远高于“好尾缀”启发。在一般情况下,BMH算法比BM有更好的性能,它简化了初始化过程,省去了计算“好尾缀”启发的移动距离,并省去了比较“坏字符”和“好尾缀”的过程。
以下是BMH算法的具体实现过程: BMH.h文件:
/* ------------------------------------------------------ */
/* FUNCTION BM : */
/* The Boyer-Moore String Searching Program. Given a */
/* text string text[] and a pattern string pat[], this */
/* function will find the first occurrence of pat[] in */
/* test[] by using the naive Boyer-Moore algorithm. */
/* */
/* Copyright Ching-Kuang Shene July/18/1989 */
/* ------------------------------------------------------ */
#include
#define NOT_FOUND -1 void get_jump(unsigned char [], int []);
int BM(unsigned char [], unsigned char []); int BM(unsigned char text[], unsigned char pat[])
{
int jump_tab[256];
int t_len = strlen(text);
int p_len = strlen(pat);
int i, j, k;
int n; get_jump(pat, jump_tab); /* compute the jump table */
for (i = p_len - 1; i < t_len; )
{ /* comp. downward */
for (j=p_len-1, k=i; j >= 0 && text[k] == pat[j]; k--,j--);
if (j < 0) /* if pat[] exhausted ? */
{
return k + 1;
} /* YES, pattern found. */
else /* NO, update using jump_tab*/
{
printf("\n%s\n", text);
for (n = 0; n < k-j; n++)
printf(" ");
printf("%s", pat);
i += jump_tab[text[i]];
}
}
return NOT_FOUND; /* text[] exhausted. */
}
/* ------------------------------------------------------ */
/* FUNCTION get_jump : */
/* Given the pattern string pat[], this function */
/* computes a jump table, jump_tab[], by look at each char*/
/* in the pattern string. */
/* ------------------------------------------------------ */
void get_jump(unsigned char pat[], int jump_tab[])
{
int length = strlen(pat);
int i; for (i = 1; i < 256; i++) /* assume length is the gap*/
jump_tab[i] = length;
for (i = 0; i < length - 1; i++) /* now adjust them. */
{
jump_tab[(int)pat[i]] = length - i - 1;
}
}
BMH.c文件:
#include
#include
#include "bm.h"
#define MAXSIZE 100
void main(void)
{
unsigned char text[MAXSIZE];
unsigned char pat[MAXSIZE];
int answer, i; printf("\nBoyer-Moore String Searching Program");
printf("\n====================================");
printf("\n\nText String --> ");
gets(text);
printf( "\nPattern String --> ");
gets(pat); if ((answer = BM(text, pat)) >= 0)
{
printf("\n");
printf("%s\n", text);
for (i = 0; i < answer; i++)
printf(" ");
printf("%s", pat);
printf("\n\nPattern Found at location %d\n", answer);
}
else
printf("\nPattern NOT FOUND.\n");
}