Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1366608
  • 博文数量: 245
  • 博客积分: 10021
  • 博客等级: 上将
  • 技术积分: 3094
  • 用 户 组: 普通用户
  • 注册时间: 2008-05-12 14:51
文章存档

2011年(2)

2009年(152)

2008年(91)

我的朋友

分类:

2008-05-17 09:20:12


下面我们就以具体的实例来看一下如何使用正则表达式。其中用黑体着重标出的是匹配到的字符串。

一个最简单的例子便是 /all/,比如下面一段文字:

John’s ball fell into the hole

John cried because it is all his life.

 

这个正则表达式不含任何的原字符,它查找的是字符串all,这个字符串all可以是独成一个单词,也可以是其它单词的一部分,因此正则表达式/all/既匹配ball里的all,也匹配完整的单词all

下面我们着重讨论正则表达式里原字符的用法。

 

3.1            行首、行尾定位符

行首定位符^

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/^Bobby/

匹配位于行首的Bobby

 

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/Bobby$/

匹配位于行尾的Bobby

 

3.2            词首、词尾定位符

词首定位符 \<

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/\

匹配位于词首的字符串Bo

 

词尾定位符 \>

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby

Bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/ball\>/

匹配位于词尾的字符串ball

 

在一个表达式中搭配使用词首定位符与词尾定位符

John’s ball fell into the hole

John cried because it is his whole life

/\/

匹配以h作为单词开头并且以e作为单词结尾的模式hole。也就是说,字母h的前面是一个分隔单词的字符(比如空格或换行符),字母l的后面也是一个分隔单词的字符。这样,在这个例子中只有完整的单词hole会被匹配,而单词whole就不会被匹配。

 

3.3            匹配单个字符

匹配任意的一个字符 .

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/By/

匹配B开头后面紧跟三个任意字符,最后紧接着一个y的字符串。在这个例子中,BobbyBippy都会被匹配。

 

匹配0个或多个前一字符 *

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck balll
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/ al*/

这里的星号(*)匹配0个或多个在它前面的那个字符。前面曾提到过,正则表达式里的*shell里的*作用是截然不同的。在shell*表示任意个数的任意字符,而在正则表达式里,*只代表任意个数(包括0个)的前一字符,*可以看作和它前面那个字符是粘连在一起的,*只限制它前面那一个字符。这个正则表达式中的*匹配单独一个或多个连续的l,甚至也匹配一个l也没有的模式,所以,单个字符a也会被匹配。

 

3.4            匹配多个字符

匹配一组字符里的任意字符 [ ]

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck balll
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/[bw]all/

方括号匹配一组字符中的一个,这个正则表达式查找的是第一个字母是bw,后面紧跟着all的字符串,因此在这个例子中,wallball都会被匹配。

 

匹配指定范围内的字符 [x-y]

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/B[a-z]p/

方括号里的短线(-)匹配某一范围内的一个字符,这个正则表达式将查找第一个字母是B,第二个字母是ASCII码介于az的字符(小写字母),第三个字母是p的字符串。

 

匹配不在指定范围内的字符 [^ ]

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.

With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/all[^A-Z0-9]/

方括号内的脱字符^是一个否定字符,这个正则表达式查找的是后面带一个特殊字符的all,这个特殊字符既不是小写字母又不是大写字母,也不是09的数字,比如它可以是一个标点符号或空格。

 

根据字符x出现的次数匹配 x\{m\}  x\{m,\}  x\{m, n\}

比如这个正则表达式/Go\{2,5\}gle/将匹配G后面至少出现2个,最多有不超过5o的模式。GoogleGoooogle会被匹配,而GogleGoooooogle则不会被匹配。

 

3.5            转义字符

如果要匹配的字符串中含有正则表达式的原字符,需要用斜线将其转义,就像c语言里打印单引号 要写成 \’ 一样。这里有个例子:我们想要查找字符串google.com,要查找的字符串里含有正则表达式的原字符“.”,因此这个正则表达式要写成 /google\.com/,如果不用 \ 转义,找到的将是google后面跟一个任意的字符,然后跟一个com的字符串。这显然不一定是我们要找的。

 

3.6            字符标签

例如在下面一段文字里:

Occurence and happening are the most general. I mean, the words occurence and happening are most generally used.

在这段文字里有两个拼错的单词,Occurenceoccurence,(其实应该是occurrence),我们可以在vi中用下面的表达式将其修改:

 

:1,$s/\([Oo]ccur\)ence/\1rence/

 

我们且不管这个vi命令的用法(其实它是一个替换命令,我们在后面介绍sed时还将提到)我们先拿出这个语句中的两个表达式:

 

/\([Oo]ccur\)ence/

\1rence

 

其中前一个是一个正则表达式。这个命令用后面的表达式内容替换前面的正则表达式匹配到的内容。vi编辑器将查找单词Occurenceoccurence,如果找到,就把圆括号中的内容加上标签(Occuroccur被加上标签),因为这是第一个被标记的模式,所以被称为标签1。这个模式被保存在称为寄存器1的内存寄存器中。在第二个正则表达式中用\1引用寄存器1中的内容,\1被替换为寄存器中的内容,后面紧跟一个rence,于是,拼错的Occurenceoccurence被改正为正确的Occurrenceoccurrence

 

3.7            原字符组合使用的例子

 

1/\/

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/\/

匹配以Bob开头,后面跟任意个数的任意字符,然后以all结尾的字符串。这里再次重复, * Shell里表示任意个数的任意字符,而在正则表达式里表示任意个数的前一字符。与 . 配合使用表示任意个数(包括零个)的任意字符。实际上,* 也可以表示重复零次或任意次它前面的一组字符,我们称这一组(有时也可能是一个)字符为“原子”。当原子包括多个字符时,这多个字符要用圆括号括起来,并且需要将圆括号转义;当原子只含一个字符时,可以不用圆括号。在这个例子里,. 表示一个任意字符,紧跟着一个*表示重复0次或任意次前面的那个任意字符。而下面的例子

/\(sup\)*info/

则表示匹配在字符串info前有0个或多个sup的字符串,因此 supinfo, info, supsupinfo都会被匹配。

 

2/B[a-z][bp]*y$/

Here is a tongue twister:

Bobby Bippy bought a bat.

Bobby Bippy bought a ball.
With his bat Bob banged the ball
Banged it bump against the wall
But so boldly Bobby banged it
That he burst his rubber ball, "Boo!" cried Bobby
Bad luck ball, Bad luck Bobby, bad luck ball
Now to drown his many troubles
Bobby Bippy's blowing bubbles.

/B[a-z][bp]*y$/

这个正则表达式匹配这样的字符串:开头字符是B,第二个字母是一个小写字母,后面紧跟0个或多个重复的bp,最后跟一个y,并且这个字符串位于行的末尾。

阅读(3648) | 评论(2) | 转发(0) |
0

上一篇:sed命令与选项

下一篇:awk基础入门(1)

给主人留下些什么吧!~~

chinaunix网友2009-03-25 17:23:57

写得好,比其它的随便写的好

chinaunix网友2008-09-27 11:31:48

啊!啊!啊! 太详细了! Thank you very very much much!