C语言正则式的使用-wujiajia-ChinaUnix博客

wujiajiawujiajia.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

wujiajia

博客访问： 128053
博文数量： 36
博客积分： 94
博客等级：民兵
技术积分： 200
用户组：普通用户
注册时间： 2009-02-23 17:34

文章分类

全部博文（36）

未分配的博文（36）

文章存档

2015年（1）

2013年（7）

2012年（3）

2011年（25）

我的朋友

最近访客

推荐博文

C语言正则式的使用

分类：

2011-08-26 14:31:52

原文地址：C语言正则式的使用作者：jiushen

#include
#include
#include
#include
#include
int main(int argc, char ** argv)
{
if (argc != 3)
{
printf("Usage: %s RegexString Text\n", argv[0]);
return 1;
}
const char * pRegexStr = "^[0-9a-zA-Z/]+$";
const char * pText = "aaaaa$$$$bbbbb";
size_t nmatch = 10;
regmatch_t pmatch[10];
regex_t oRegex;
int nErrCode = 0;
char szErrMsg[1024] = {0};
size_t unErrMsgLen = 0;
if ((nErrCode = regcomp(&oRegex, pRegexStr, REG_EXTENDED)) == 0)
{
if ((nErrCode = regexec(&oRegex, pText, nmatch, pmatch, 0)) == 0)
{
printf("%s matches %s\n", pText, pRegexStr);
regfree(&oRegex);
return 0;
}
}
unErrMsgLen = regerror(nErrCode, &oRegex, szErrMsg, sizeof(szErrMsg));
unErrMsgLen = unErrMsgLen < sizeof(szErrMsg) ? unErrMsgLen : sizeof(szErrMsg) - 1;
szErrMsg[unErrMsgLen] = '\0';
printf("ErrMsg: %s\n", szErrMsg);
regfree(&oRegex);
return 1;
}

pmatch存的是匹配和()里的子匹配的开始和结束位置。说明见下面的内容。
完全匹配的话要加^$

I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:

mikko.c:
-----

#include
#include
#include

int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}

-----

This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:

$ ./mikko
start=2 end=3
start=-1 end=-1

instead of the expected

start=2 end=3
start=3 end=4

Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.

Thanks,

Mikko Nummelin
---------------------------------------------

The problem is that you misunderstand what a match is.

If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).

For example, try:

#include
#include
#include

int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\$.\$",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}

$ ./a
start=2 end=4
start=3 end=4

---------------------------------------
On 2 Apr 2008 at 8:37, mikko.n wrote:

Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?

Just use a loop, like this:

#include
#include
#include

int main(void)
{
regex_t p;
regmatch_t pm;
char *s="mikko mikko";
regoff_t last_match=0;
regcomp(&p, "k", 0);
while(regexec(&p, s+last_match, 1, &pm, 0) == 0) {
printf("start=%d end=%d\n", pm.rm_so + last_match, pm.rm_eo + last_match);
last_match += pm.rm_so+1;
}
regfree(&p);
return 0;
}

阅读(1144) | 评论(0) | 转发(0) |

上一篇：快速排序quicksort的Ｃ实现

下一篇：查找算法复习——线性表的查找

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6