Chinaunix首页 | 论坛 | 博客
  • 博客访问: 126147
  • 博文数量: 36
  • 博客积分: 94
  • 博客等级: 民兵
  • 技术积分: 200
  • 用 户 组: 普通用户
  • 注册时间: 2009-02-23 17:34
文章分类
文章存档

2015年(1)

2013年(7)

2012年(3)

2011年(25)

分类:

2011-08-26 14:31:52

原文地址:C语言正则式的使用 作者:jiushen

  1. #include
    #include
    #include
    #include
    #include

  2. int main(int argc, char ** argv)
  3. {
  4.     if (argc != 3)
  5.     {
  6.         printf("Usage: %s RegexString Text\n", argv[0]);
  7.         return 1;
  8.     }

  9.     const char * pRegexStr = "^[0-9a-zA-Z/]+$";
  10.     const char * pText = "aaaaa$$$$bbbbb";
  11.     size_t nmatch = 10;
  12.     regmatch_t pmatch[10];

  13.     regex_t oRegex;
  14.     int nErrCode = 0;
  15.     char szErrMsg[1024] = {0};
  16.     size_t unErrMsgLen = 0;

  17.     if ((nErrCode = regcomp(&oRegex, pRegexStr, REG_EXTENDED)) == 0)
  18.     {
  19.         if ((nErrCode = regexec(&oRegex, pText, nmatch, pmatch, 0)) == 0)
  20.         {
  21.             printf("%s matches %s\n", pText, pRegexStr);
  22.             regfree(&oRegex);
  23.             return 0;
  24.         }
  25.     }

  26.     unErrMsgLen = regerror(nErrCode, &oRegex, szErrMsg, sizeof(szErrMsg));
  27.     unErrMsgLen = unErrMsgLen < sizeof(szErrMsg) ? unErrMsgLen : sizeof(szErrMsg) - 1;
  28.     szErrMsg[unErrMsgLen] = '\0';
  29.     printf("ErrMsg: %s\n", szErrMsg);

  30.     regfree(&oRegex);
  31.     return 1;
  32. }
pmatch存的是匹配和()里的子匹配的开始和结束位置。说明见下面的内容。
完全匹配的话要加^$



I have recently been experimenting with GNU C library regular
expression functions and noticed a problem with pattern matching. It
seems to recognize only the first match but ignoring the rest of them.
An example:

mikko.c:
-----

#include
#include
#include

int main(int argc, char *argv[]) {
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}

-----

This intends to match regular expression 'k' against string 'mikko'
and return start and end of two first matches in the array pm of
regmatch_t:s. The output is, however:

$ ./mikko
start=2 end=3
start=-1 end=-1

instead of the expected

start=2 end=3
start=3 end=4

Is this a bug in GNU library or have I overlooked something? I have
not found any examples from the Internet of multiple subexpression
matching with With more complicated regular expressions it usually seems to return
only the first match as here, but with wildcards the largest match,
nevertheless only one of them.

Thanks,

Mikko Nummelin
---------------------------------------------

The problem is that you misunderstand what a match is.

If the regex matches, then pm[0] contains the offsets of the (first)
match for the whole regex. But pm[1],... don't contain the offets for
subsequent matches of the whole regex, but rather contain the offsets of
any parenthesized subexpressions that matched (in the match recorded in
pm[0]).

For example, try:

#include
#include
#include

int main(void)
{
regex_t p;
regmatch_t pm[2];
regcomp(&p,"k\\(.\\)",0);
regexec(&p,"mikko",2,pm,0);
printf("start=%d end=%d\n",pm[0].rm_so,pm[0].rm_eo);
printf("start=%d end=%d\n",pm[1].rm_so,pm[1].rm_eo);
regfree(&p);
return 0;
}


$ ./a
start=2 end=4
start=3 end=4

---------------------------------------
On 2 Apr 2008 at 8:37, mikko.n wrote:
Is there then a simple alternative which would work so that it returns
all the matches of the original regexp in the text?
Just use a loop, like this:


#include
#include
#include

int main(void)
{
regex_t p;
regmatch_t pm;
char *s="mikko mikko";
regoff_t last_match=0;
regcomp(&p, "k", 0);
while(regexec(&p, s+last_match, 1, &pm, 0) == 0) {
printf("start=%d end=%d\n", pm.rm_so + last_match, pm.rm_eo + last_match);
last_match += pm.rm_so+1;
}
regfree(&p);
return 0;
}
阅读(1126) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~