C中调用pcre正则库-郝姬友-ChinaUnix博客

郝姬友的ChinaUnix博客

首页　| 　博文目录　| 　关于我

郝姬友

博客访问： 212674
博文数量： 65
博客积分： 0
博客等级：民兵
技术积分： 91
用户组：普通用户
注册时间： 2015-04-10 09:41

文章分类

全部博文（65）

未分配的博文（65）

文章存档

2020年（1）

2018年（1）

2017年（30）

2016年（30）

2015年（3）

我的朋友

相关博文

C中调用pcre正则库

分类： C/C++

2016-09-27 20:04:19

原文地址：C中调用pcre正则库作者：冷寒生

在Linux的C标准库中包含了一个正则库(Windows下无此正则库)，只需要引用即可使用，但是使用了几天却发现Linux自带的正则库无法使用元字符和非贪婪匹配，例如：

str:   1.1.1.1
regex: (\d*.\d*.\d*.\d*)

其中的正则表达式使用了元字符\d来匹配数字，但在regex.h的正则库中却无法匹配。

str:   \123\456\

regex:
\(.+?)\

其中的正则表达式使用了非贪婪匹配，但在regex.h的正则库中却只匹配到了“123\456”。

最后查了下pcre正则库的使用方法，照着网上的例子写了段测试代码：

/* Compile thuswise:
*
*   gcc -Wall pcre1.c -I/usr/local/include -L/usr/local/lib -R/usr/local/lib -lpcre
*
*/

#include
#include
#include

#define OVECCOUNT 30/* should be a multiple of 3 */
#define EBUFLEN 128
#define BUFLEN 1024

int main()
{
    pcre *re;
    const char *error;
    int erroffset;
    int ovector[OVECCOUNT];
    int rc, i;

    char src[] = "123.123.123.123:80|1.1.1.1:88";
    char pattern[] = "(\\d*.\\d*.\\d*.\\d*):(\\d*)";

    printf("String : %s\n", src);
    printf("Pattern: \"%s\"\n", pattern);

    re = pcre_compile(pattern, 0, &error, &erroffset, NULL);
    if (re == NULL) {
        printf("PCRE compilation failed at offset %d: %s\n", erroffset, error);
        return 1;
    }

    char *p = src;
    while ( ( rc = pcre_exec(re, NULL, p, strlen(p), 0, 0, ovector, OVECCOUNT)) != PCRE_ERROR_NOMATCH )
    {
        printf("\nOK, has matched ...\n\n");

        for (i = 0; i < rc; i++)
        {
            char *substring_start = p + ovector[2*i];
            int substring_length = ovector[2*i+1] - ovector[2*i];
            char matched[1024];
            memset( matched, 0, 1024 );
            strncpy( matched, substring_start, substring_length );

            printf( "match:%s\n", matched );
        }

        p += ovector[1];
        if ( !p )
        {
            break;
        }
    }
    pcre_free(re);
    return 0;
}

以上代码的打印结果如下：

String : 123.123.123.123:80|1.1.1.1:88
Pattern: "(\d*.\d*.\d*.\d*):(\d*)"

OK, has matched ...

0: 123.123.123.123:80
1: 123.123.123.123
2: 80

OK, has matched ...

0: 1.1.1.1:88
1: 1.1.1.1
2: 88

以上代码主要用到了pcre_compile和pcre_exec两个函数，其原型如下：

（1）. pcre_compile

1 2 3 4 5 6 7 8 9 10

pcre *pcre_compile(const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr); 功能：编译指定的正则表达式参数：pattern, 输入参数，将要被编译的字符串形式的正则表达式 options, 输入参数，用来指定编译时的一些选项 errptr, 输出参数，用来输出错误信息 erroffset, 输出参数，pattern中出错位置的偏移量 tableptr, 输入参数，用来指定字符表，一般情况用NULL, 使用缺省的字符表返回值：被编译好的正则表达式的pcre内部表示结构

（2）. pcre_exec

1 2 3 4 5 6 7 8 9 10 11 12 13

int pcre_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, int options, int *ovector, int ovecsize); 功能：用来检查某个字符串是否与指定的正则表达式匹配参数： code, 输入参数，用pcre_compile编译好的正则表达结构的指针 extra, 输入参数，用来向pcre_exec传一些额外的数据信息的结构的指针 subject, 输入参数，要被用来匹配的字符串 length, 输入参数，要被用来匹配的字符串的长度 startoffset, 输入参数，用来指定subject从什么位置开始被匹配的偏移量 options, 输入参数，用来指定匹配过程中的一些选项 ovector, 输出参数，用来返回匹配位置偏移量的数组 ovecsize, 输入参数，用来返回匹配位置偏移量的数组的最大大小返回值：匹配成功返回非负数，匹配返回负数

其中ovector这个参数需要明白，如果pcre成功匹配的话，则会把匹配字符串的起止位置写入ovector中，例如以上代码中ovector的值如下：

$1 = {0, 18, 0, 15, 16, 18, 134513344, 134513569, 671417176, 671662336, 671589112, -1077941576, 671417750, 671662336, 671589112, -1077941512, 671405544, 671632420, 1,
-1077941528, 1, 0, 0, 1, 0, 0, 0, 673030784, 16, 0}

由于代码在预定义中设置最多匹配的数量为30个，所以这里列出了30个值，其实pcre_exec只匹配到了3个结果，变量rc保存的就是pcre_exec的匹配数量。那么这三个匹配结果的起止位置分别是：

0,18 = 123.123.123.123:80
0,15 = 123.123.123.123
16,18 = 80

由此可见，根据ovector中的值就可以提取出匹配结果。

另外，代码中的正则表达式“(\d*.\d*.\d*.\d*):(\d*)”用到了两个小括号，由于正则表达式会将一对小括号中匹配到的值保存到匹配结果中，所以这段正则表达式匹配到了三个结果，如果目的只是匹配IP地址和端口号的话，则可以去掉小括号，即“\d*.\d*.\d*.\d*:\d*”，这样就只会匹配到一个结果。

参考资料：

http://hi.baidu.com/ajoe/blog/item/83dd800a7178e335b0351d6c.html

http://hi.baidu.com/joec3/blog/item/5375d3da14ab07d6b7fd487b.html/cmtid/79872dc323e4935bb219a8f2

阅读(1433) | 评论(0) | 转发(0) |

上一篇：pcre使用例子

下一篇：unix 通配符

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6