regular expressions，How Much Text Gets Changed?-dengjin

但使岁月静好，现世安稳

首页　| 　博文目录　| 　关于我

dengjin_cu

博客访问： 220807
博文数量： 89
博客积分： 2531
博客等级：少校
技术积分： 830
用户组：普通用户
注册时间： 2008-10-19 10:10

文章分类

全部博文（89）

web development（0）
shell（8）
FORTRAN（2）
ACM（1）
JAVA（1）
Algorithm and Da（2）
c语言和linux编程（24）
Assembly（2）
operating system（8）
windows程序设计（1）
c++技术（13）
DATABASE（1）
python（4）
技术文章转载（6）

程序书籍信息（1）
我的心情日记（15）

那些只言片语（0）

心情记录（11）

电影感受（2）
未分配的博文（1）

文章存档

2011年（6）

2010年（26）

2009年（35）

2008年（22）

我的朋友

相关博文

regular expressions，How Much Text Gets Changed?

分类： LINUX

2010-02-06 22:22:54

One issue we haven't discussed yet is the question "how much text matches?" Really, there are two questions. The second question is "where does the match start?" Indeed, when doing simple text searches, such as with grep or egrep, both questions are irrelevant. All you want to know is whether a line matched, and if so, to see the line. Where in the line the match starts, or to where in the line it extends, doesn't matter.

However, knowing the answer to these questions becomes vitally important when doing text substitution with sed or programs written in awk. (Understanding this is also important for day-to-day use when working inside a text editor, although we don't cover text editing in this book.)

The answer to both questions is that a regular expression matches the longest, leftmost substring of the input text that can match the entire expression. In addition, a match of the null string is considered to be longer than no match at all. (Thus, as we explained earlier, given the regular expression ab*c, matching the text ac, the b* successfully matches the null string between a and c.) Furthermore, the POSIX standard states: "Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, shall match the longest possible string." (Subpatterns are the parts enclosed in parentheses in an ERE. For this purpose, GNU programs often extend this feature to \(...\) in BREs too.)

If sed is going to be replacing the text matched by a regular expression, it's important to be sure that the regular expression doesn't match too little or too much text. Here's a simple example:

$ echo Tolstoy writes well | sed 's/Tolstoy/Camus/'     Use fixed strings

Camus writes well

Of course, sed can use full regular expressions. This is where understanding the "longest leftmost" rule becomes important:

$ echo Tolstoy is worldly | sed 's/T.*y/Camus/'         Try a regular expression

Camus                                                  What happened?

The apparent intent was to match just Tolstoy. However, since the match extends over the longest possible amount of text, it went all the way to the y in worldly! What's needed is a more refined regular expression:

$ echo Tolstoy is worldly | sed 's/T[[:alpha:]]*y/Camus/'

Camus is worldly

In general, and especially if you're still learning the subtleties of regular expressions, when developing scripts that do lots of text slicing and dicing, you'll want to test things very carefully, and verify each step as you write it.

Finally, as we've seen, it's possible to match the null string when doing text searching. This is also true when doing text replacement, allowing you to insert text:

$ echo abc | sed 's/b*/1/'         Replace first match

1abc

$ echo abc | sed 's/b*/1/g'        Replace all matches

1a1c1

Note how b* matches the null string at the front and at the end of abc.

阅读(442) | 评论(0) | 转发(0) |

上一篇：Linux下Grapher，Surfer和Intel FORTRANcompiler搭建地球物理环境

下一篇：sed单行脚本快速参考

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6