Python之扩展的正则表达式-linewer-ChinaUnix博客

linewer(与系统无关) perlbox.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

linewer

博客访问： 671951
博文数量： 139
博客积分： 2655
博客等级：少校
技术积分： 1723
用户组：普通用户
注册时间： 2008-04-02 16:03

文章分类

全部博文（139）

C/C++（3）
Linux（21）
Python（13）
Perl（6）
Servers（3）
Misc（30）
Kernel（18）
Work&Health（7）
Feeling&Life（28）
Network-TCP/IP（9）
DataBase（1）
未分配的博文（0）

文章存档

2013年（2）

2011年（17）

2010年（14）

2009年（86）

2008年（20）

我的朋友

相关博文

Python之扩展的正则表达式

分类： Python/Ruby

2009-09-27 15:33:51

这个在help(re)中都能找到,和Perl比较了下，顺便做个笔记!
Perl中(perldoc perlre中查找Extended Patterns部分 for more detail):

(0) /i /s /m /g /o仅替换一次 /e替换之前计算表达式 /x 忽略模式中的空白

(1) (?pimsx-imsx) (-表示关闭)
    /(?option)pattern/，等价于/pattern/option optin=i,s,m,x

(2) (?:string)
    不存储括号内的匹配内容:/(?:a|b|c)(d|e)f\1/ \1匹配d|e

    (?imsx-imsx:pattern)等价于(?:(?imsx-imsx)pattern)

(3) 肯定的后行预见匹配语法为/pattern(?=string)/,其意义为匹配后面为string的模式，
   相反的(?!string)意义为匹配后面非string的模式

(4) 肯定的前行预见匹配语法为/(?<=string)pattern/,其意义为匹配后面为string的模式，相反的,(?
(5) 用(?# comment)来加注释
    if ($string =~ /(?i)[a-z]{2,3}(?# match two or three alphabetic characters)/ {
       ...
     }

(6) (?pattern) (?'NAME'pattern) 命名分组 \k 或\k'NAME'取分组其实Perl也支持Python形式的命名分组(?Ppattern) (?P=NAME)

(7) (?|pattern) 分支重设
# before ---------------branch-reset----------- after
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
    # 1            2        2 3        2     3     4
(8) (?(condition)yes-pattern|no-pattern)
     (?(condition)yes-pattern) 同下面Python(10)的解释

一些特殊正则元字符例子(来自CU论坛):
回溯引用和前后查找:
i) 向前查找   (?=..) 　　　　　　　
echo "ab2c121a" |perl -ne 'print $1 if /(.*?)(?=2)/;' #print ab
ii)向后查找 (?<=..) 　　     　　　　
echo "ab2c121a" |perl -ne 'print $1 if /(?<=2)(.*)(?=2)/;' 　#print c1
iii)负向-前/后查找 (?!..) (? #不能匹配 ..
echo "ab2c121a" |perl -ne 'print $1 if /(? echo "ab2c121a" |perl -ne 'print $1 if /(? iv)条件 ?(id)yes-pattern|no-pattern
# ?() 例如

必须同时出现
echo "

"|perl -ne 'print $2 if /(

)?(\w*)(?(1)<\/p>)/'
#print xx
echo "

xx"|perl -ne 'print $2,"\n" if /(

)?(\w*)(?(1)<\/p>)/' #print 空
echo "xx"|perl -ne 'print $2 if /(

;)?(\w*)(?(1)<\/p>)/' #print xx
# ?()| 例如还是上面的，
# 当有

可以接

也可以接数字结尾
echo "

xx1

"|perl -ne 'print $2 if /(

;)?(\w*)(?(1)<\/p>|\d)/' #print xx
echo "

xx1"|perl -ne 'print $2 if /(

;)?(\w*)(?(1)<\/p>|\d)/'     #print xx

Python中:
(除了实现Perl的这些扩展外，Python还有自己的扩展。若在'?'后紧跟的是P的话，则表示是Python的扩展)
(1). (?iLmsux) Set the I, L, M, S, U, or X flag for the RE 这个一般放在表达式的第一位

    I IGNORECASE 忽略大小写
    L LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
    M MULTILINE   "^"matches the beginning of lines (after a newline) as well as the beginning of the string.
                   匹配字符串的开始或者每行的开始(这个指字符串中含有\n,将其看成"多行"字符串)

                   "$"matches the end of lines (before a newline) as well as the end of the string.
                   匹配字符串的结尾或者每行的结尾(这个指字符串中含有\n,将其看成"多行"的字符串)
    S DOTALL      匹配任何字符，包括换行符 ("单行"的字符串)
    X VERBOSE     忽略空格和注释
例如:
pattern = """
    ^                   # beginning of string
    M{0,4}              # thousands - 0 to 4 M's
    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                        #            or 500-800 (D, followed by 0 to 3 C's)
    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                        #        or 50-80 (L, followed by 0 to 3 X's)
    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                        #        or 5-8 (V, followed by 0 to 3 I's)
    $                   # end of string
    """
>>> re.search(pattern, 'M', re.VERBOSE)

     U UNICODE     Make \w, \W, \b, \B, dependent on the Unicode locale.
(2) (?:...)       不记录分组(即不能后向引用这个)
      例如:p=re.compile('(?:a|b|c)(d|e)')
           re.search(p,'adfg').groups() --->('d',)
(3) (?P...) 命名分组.它可以通过MatchObject的方法group('name')得到，同时在表达式中也可以用(?P=name)来表示对它的引用。
(4) (?P=name)     引用命名分组,若记录它还要再加一个()
例如: p = re.compile(r'(?P\b\w+\b).*(?P=word)')
(5） (?#...)       注释
    例如: p = re.compile(r'(\b\w+\b)(?#this is a comment)')
(6) (?=...) 先行断言.断言某位置的后面能匹配这个表达式,最后的匹配结果不包括此字符串
(7) (?!...) 非先行断言.断言某位置的后面不能匹配这个表达式,最后的匹配结果不包括此字符
(8) (?<=...) 后发断言.断言某位置的前面能匹配这个表达式,最后的匹配结果不包括此字符串
(9) (?
     例如:p=re.compile(r'Rui (?=Zhang)') p=re.compile(r'Rui (?!Zhang)')

         p=re.compile(r'(?<=Rui )Zhang') p=re.compile(r'Rui (?

(10) (?(id/name)yes|no) 若分组id若name已经匹配,则使用yes,否则用no (no可选).另外若想记录这个匹配的话,还要加一个()

例如:p=re.compile(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)'
    p=re.compile(r'()?(\w*)(?(1)<\/p>|\d)')
    re.search(p,‘
xx
').groups() re.search(p,'xx1').groups()

正则表达式的优先级(从高到低!)
操作符                       描述
\                      转义符
(), (?:), (?=), []           圆括号和方括号
*, +, ?, {n}, {n,}, {n,m}    限定符
^, $, \anymetacharacter      位置和顺序
|                "或"操作

找到一篇介绍比较详细的帖子:

阅读(3544) | 评论(0) | 转发(0) |

上一篇：[ZZ]Bottle-Python Web的轻量级框架

下一篇：也谈字符串反向打印

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6