Chinaunix首页 | 论坛 | 博客
  • 博客访问: 5393387
  • 博文数量: 1144
  • 博客积分: 11974
  • 博客等级: 上将
  • 技术积分: 12312
  • 用 户 组: 普通用户
  • 注册时间: 2005-04-13 20:06
文章存档

2017年(2)

2016年(14)

2015年(10)

2014年(28)

2013年(23)

2012年(29)

2011年(53)

2010年(86)

2009年(83)

2008年(43)

2007年(153)

2006年(575)

2005年(45)

分类: LINUX

2008-10-26 00:00:33

* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/

Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/

Empty Line
/^$/

Word
\b\w+\b

* Questions

* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)

* Basic Structure

* Syntax
m/regex/ismx
s/regex/replacement/ismxg

* Flags
i case-insensitive
s single-line or dot-match-all (only affects .)
m multi-line (only ^ $)
x allows space and comment (perl specific)
g global subsitution

* Alternations
m/ABC|XYZ/

* Sequence
m/ABC/

* Repeatition
(agressive)
A = a? 0 or 1
a* 0 or more
a+ 1 or more
a{m} m
a{m,} m or more
a{m,n} m to n (inclusively)

(lazy)
a??
a*?
a+?
a{m}?
a{m,}?
a{m,n}?

aa
(a?)(a*) $1 => a a
(a??)(a*) $1 => "" aa

* Atoms
Character = a b c
Character Class
Escape = \ + non-alpha, such as \\, \+, \(, except reference
Meta Escape= \ + alpha[a-zA-Z]
Groups = (...)

* Character Class
[abc] [a-b] [^abc] [^abc0-9]
[- and [] are considered literal
[-a] = - or a
[^\-]

[[]
[]]
[ ]

* Posix Character Class
[[.a.]] collation
[[=a=]] equivalence
[[:alpha:]]

* Meta
. anything except newlines (normal mode)
. anything (s mode, singleline, dotall)
^ start of string, or start of line (m mode)
$ end of string (including newline), or end of line (m mode)

* Meta Escape
\t \n \r \f \a \e
\0nn \xnn
\cA (using algorithm ch ^ 0x40)
\cM
\N{name}
\l lowercase next char
\u uppercase next char
\L...\E lowercase until \E
\U...\E uppercase until \E
\Q...\E quote until \E
\w \W word char
\s \S space
\d \D digit
\b \B boundary
\p{property}
\P{property}
\X combining character sequence
\C single byte (perl)
\< start of word (emacs)
\> end of word (emacs)

* Groups
(abc) for capture group

* Special group
(?#comment)
(?imsx-imsx) embedded flags
(?:pattern) for non-capture
(?imsx-imsx:pattern) subpattern
(?=pattern) positive look ahead
(?!pattern) negative look ahead
(?<=pattern) positive look behind
(?

* Reference for capture
m/(x)\1/
s/(x)/$1$1/

* Traditional vs Extended
\{m,n\} vs {m,n}
\(xxx\) vs (xxx)
Emacs is still using traditional regular expression

* Special extension
\< start of word (emacs)
\> end of word (emacs)

* New Lines

\n \v \r \r\n \f \x85 \x2028 \x2029 \x1A

* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/

Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/

Empty Line
/^$/

Word
\b\w+\b

* Questions

* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)

阅读(1802) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~