Chinaunix首页 | 论坛 | 博客
  • 博客访问: 556704
  • 博文数量: 92
  • 博客积分: 2511
  • 博客等级: 少校
  • 技术积分: 932
  • 用 户 组: 普通用户
  • 注册时间: 2008-10-19 10:10
文章分类
文章存档

2011年(6)

2010年(27)

2009年(37)

2008年(22)

我的朋友

分类: LINUX

2010-02-06 22:22:54

One issue we haven't discussed yet is the question "how much text matches?" Really, there are two questions. The second question is "where does the match start?" Indeed, when doing simple text searches, such as with grep or egrep, both questions are irrelevant. All you want to know is whether a line matched, and if so, to see the line. Where in the line the match starts, or to where in the line it extends, doesn't matter.

However, knowing the answer to these questions becomes vitally important when doing text substitution with sed or programs written in awk. (Understanding this is also important for day-to-day use when working inside a text editor, although we don't cover text editing in this book.)

The answer to both questions is that a regular expression matches the longest, leftmost substring of the input text that can match the entire expression. In addition, a match of the null string is considered to be longer than no match at all. (Thus, as we explained earlier, given the regular expression ab*c, matching the text ac, the b* successfully matches the null string between a and c.) Furthermore, the POSIX standard states: "Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, shall match the longest possible string." (Subpatterns are the parts enclosed in parentheses in an ERE. For this purpose, GNU programs often extend this feature to \(...\) in BREs too.)

If sed is going to be replacing the text matched by a regular expression, it's important to be sure that the regular expression doesn't match too little or too much text. Here's a simple example:

$ echo Tolstoy writes well | sed 's/Tolstoy/Camus/'     Use fixed strings

Camus writes well

Of course, sed can use full regular expressions. This is where understanding the "longest leftmost" rule becomes important:

$ echo Tolstoy is worldly | sed 's/T.*y/Camus/'         Try a regular expression

Camus                                                  What happened?

The apparent intent was to match just Tolstoy. However, since the match extends over the longest possible amount of text, it went all the way to the y in worldly! What's needed is a more refined regular expression:

$ echo Tolstoy is worldly | sed 's/T[[:alpha:]]*y/Camus/'

Camus is worldly

In general, and especially if you're still learning the subtleties of regular expressions, when developing scripts that do lots of text slicing and dicing, you'll want to test things very carefully, and verify each step as you write it.

Finally, as we've seen, it's possible to match the null string when doing text searching. This is also true when doing text replacement, allowing you to insert text:

$ echo abc | sed 's/b*/1/'         Replace first match

1abc

$ echo abc | sed 's/b*/1/g'        Replace all matches

1a1c1

Note how b* matches the null string at the front and at the end of abc.

阅读(1097) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~