模式串匹配问题总结(zz)-CUDev-ChinaUnix博客

CUDevcudev.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

CUDev

博客访问： 5816355
博文数量： 675
博客积分： 20301
博客等级：上将
技术积分： 7671
用户组：普通用户
注册时间： 2005-12-31 16:15

文章分类

全部博文（675）

Web架构（4）
Thinking（1）
SF（2）
Kernel and Drive（70）
perl（2）
QT4学习笔记（9）
网络编程（52）
嵌入式Linux（4）
服务器管理（64）
操作系统研究（11）
Linux深入学习（38）
算法研究（29）
网络安全（34）
python（19）
心情日记（6）
程序设计（127）
Linux应用（134）
Shell（64）
未分配的博文（5）

文章存档

2012年（1）

2011年（20）

2010年（14）

2009年（63）

2008年（118）

2007年（141）

2006年（318）

我的朋友

最近访客

推荐博文

模式串匹配问题总结(zz)

分类：

2007-04-27 21:01:44

今天在看中国源码网上面有人贴了一篇文章，介绍了一个Sunday算法。

1。函数名: strstr
功     能: 在串中查找指定字符串的第一次出现
用     法: char *strstr(char *str1, char *str2);
程序例:
#include
#include
int main(void)
{
      char *str1 = "Borland International", *str2 = "nation", *ptr;
      ptr = strstr(str1, str2);
      printf("The substring is: %s\n", ptr);
      return 0;
}
//据说这个strstr也KMP的算法效率差不多

2。BM算法的改进的算法SUNDAY--Boyer-Moore-Horspool-Sunday Aglorithm

BM算法优于KMP

SUNDAY 算法描述：

字符串查找算法中，最著名的两个是KMP算法（Knuth-Morris-Pratt)和BM算法（Boyer-)。两个算法在最坏情况下均具有线性的查找时间。但是在实用上，KMP算法并不比最简单的c库函数strstr()快多少，而BM算法则往往比KMP算法快上3－5倍。但是BM算法还不是最快的算法，这里介绍一种比BM算法更快一些的查找算法。

例如我们要在"substring searching algorithm"查找"search"，刚开始时，把子串与文本左边对齐：

substring searching algorithm
search
^

结果在第二个字符处发现不匹配，于是要把子串往后移动。但是该移动多少呢？这就是各种算法各显神通的地方了，最简单的做法是移动一个字符位置；KMP是利用已经匹配部分的信息来移动；BM算法是做反向比较，并根据已经匹配的部分来确定移动量。这里要介绍的方法是看紧跟在当前子串之后的那个字符（上图中的 'i')。

显然，不管移动多少，这个字符是肯定要参加下一步的比较的，也就是说，如果下一步匹配到了，这个字符必须在子串内。所以，可以移动子串，使子串中的最右边的这个字符与它对齐。现在子串'search'中并不存在'i'，则说明可以直接跳过一大片，从'i'之后的那个字符开始作下一步的比较，如下图：

substring searching algorithm
　　　 search
　　　　^

比较的结果，第一个字符就不匹配，再看子串后面的那个字符，是'r',它在子串中出现在倒数第三位，于是把子串向前移动三位，使两个'r'对齐，如下：

substring searching algorithm
　　　　 search
　　　　　　　^

哈！这次匹配成功了！回顾整个过程，我们只移动了两次子串就找到了匹配位置，是不是很神啊?!可以证明，用这个算法，每一步的移动量都比BM算法要大，所以肯定比BM算法更快。

#include
#include
#include
#include
#include
#include
#include

using namespace std;
main()
{
char *text=new char[100];
text="substring searching algorithm search";
char *patt=new char[10];
patt="search";
size_t temp[256];
size_t *shift=temp;

size_t patt_size=strlen(patt);
cout<<"size : "<for(size_t i=0;i<256;i++)
     *(shift+i)=patt_size+1;//所有值赋于7，对这题而言
for(i=0;i     *(shift+unsigned char(*(patt+i) ) )=patt_size-i;
/* //       移动3步-->shift['r']=6-3=3;移动三步
//shift['s']=6步,shitf['e']=5以此类推
*/
size_t text_size=strlen(text);
size_t limit=text_size-i+1;

for(i=0;i     if(text[i]==*patt)
     {
       /*       ^13--这个r是位，从0开始算
      substring searching algorithm
             search
             searching-->这个s为第10位，从0开始算
             如果第一个字节匹配，那么继续匹配剩下的
           */
      char* match_text=text+i+1;
      size_t     match_size=1;
      do{
       if(match_size==patt_size)

        cout<<"the no is "<      }while( (*match_text++)==patt[match_size++] );
     }

cout<

return 0;

}
/*
size : 6
the no is 10
the no is 30

Press any key to continue
*/

PS：可以参考Snort代码中的代码，已经有一个CUer做了。

http://www.cublog.cn/u/22679/showart_223817.html

阅读(3748) | 评论(2) | 转发(0) |

上一篇：挣脱Windows的枷锁：Linux屏幕录像(zz)

下一篇：提高特征码扫描效率的一些个人经验(zz)

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6