Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1789023
  • 博文数量: 335
  • 博客积分: 4690
  • 博客等级: 上校
  • 技术积分: 4341
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-08 21:38
个人简介

无聊之人--除了技术,还是技术,你懂得

文章分类

全部博文(335)

文章存档

2016年(29)

2015年(18)

2014年(7)

2013年(86)

2012年(90)

2011年(105)

分类: Python/Ruby

2011-08-20 16:26:39

7.5. Verbose Regular Expressions

详细的正则表达式

So far you've just been dealing with what I'll call “compact” regular expressions. As you've seen, they are difficult to read, and even if you figure out what one does, that's no guarantee that you'll be able to understand it six months later. What you really need is inline documentation.

现在你已经能处理我所谓的简明(compact)的正则表达式。正如你所看到的,它们阅读起来比较困难,即使你揣测出某人的意图,也不能保证六个月后你还能读懂。你真正所需要的是内联文档。

Python allows you to do this with something called verbose regular expressions. A verbose regular expression is different from a compact regular expression in two ways:

Python允许你使用Verbose正则表达式来实现这种效果。一个Verbose表达式在两个方面不同于简明正则表达式:

  • Whitespace is ignored. Spaces, tabs, and carriage returns are not matched as spaces, tabs, and carriage returns. They're not matched at all. (If you want to match a space in a verbose regular expression, you'll need to escape it by putting a backslash in front of it.)
  • 空格被忽略掉。空格,制表符以及回车都不匹配空格,制表符以及回车。它们根本就不进行匹配(如果你想在verbose正则表达式中匹配一个空格,你需要使用反斜杠对其进行转义)
  • Comments are ignored. A comment in a verbose regular expression is just like a comment in Python code: it starts with a # character and goes until the end of the line. In this case it's a comment within a multi-line string instead of within your source code, but it works the same way.
  • 注释被忽略掉。在verbose表达式中的注释同Python代码中的注释是一样的。它以#开始,知道本行的结束。在本例中它是一个跨越多行字的注释而不包含在你的代码内,但是它以同样的方式工作。

This will be more clear with an example. Let's revisit the compact regular expression you've been working with, and make it a verbose regular expression. This example shows how.

使用一个列子来说明可能会更清晰。我们再次浏览先前你使用的简明正则表达式,然后将它转变成verbose正则表达式。例子如下。

Example 7.9. Regular Expressions with Inline Comments

7.9 使用行内注释的正则表达式

  1. >>> pattern = """
  2.     ^ # beginning of string
  3.     M{0,4} # thousands - 0 to 4 M's
  4.     (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
  5.                         # or 500-800 (D, followed by 0 to 3 C's)
  6.     (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
  7.                         # or 50-80 (L, followed by 0 to 3 X's)
  8.     (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
  9.                         # or 5-8 (V, followed by 0 to 3 I's)
  10.     $ # end of string
  11.     """
  12. >>> re.search(pattern, 'M', re.VERBOSE)
  13. <_sre.SRE_Match object at 0x008EEB48>
  14. >>> re.search(pattern, 'MCMLXXXIX', re.VERBOSE)
  15. <_sre.SRE_Match object at 0x008EEB48>
  16. >>> re.search(pattern, 'MMMMDCCCLXXXVIII', re.VERBOSE)
  17. <_sre.SRE_Match object at 0x008EEB48>
  18. >>> re.search(pattern, 'M')

1

The most important thing to remember when using verbose regular expressions is that you need to pass an extra argument when working with them: re.VERBOSE is a constant defined in the re module that signals that the pattern should be treated as a verbose regular expression. As you can see, this pattern has quite a bit of whitespace (all of which is ignored), and several comments (all of which are ignored). Once you ignore the whitespace and the comments, this is exactly the same regular expression as you saw in the previous section, but it's a lot more readable.

你所需要记住的最重要的事就是当你在使用verbose表达式的时候,你需要传递一个额外的参数好让它工作:re.VERBOSE,它是一个定义在re模块中的常量。该常量表明模式应该被认为是verbose正则表达式。正如你所看到的,该模式含有许多空格(所有的空格都被忽略),以及几个注释(所有的注释都被忽略)。一旦你忽略了空格和注释,你看到的表达式同你在前一部分看到的一模一样,但是它确更加可读。

2

This matches the start of the string, then one of a possible four M, then CM, then L and three of a possible three X, then IX, then the end of the string.

该模式匹配字符串的开始,一个M,接着是CM,一个L以及三个XIX,最后匹配字符串的结尾。

3

This matches the start of the string, then four of a possible four M, then D and three of a possible three C, then L and three of a possible three X, then V and three of a possible three I, then the end of the string.

该模式从字符串的开始进行匹配,接着是4M,一个D,三个C,一个L,三个X,一个V以及三个I,最后匹配到字符串的结尾。

4

This does not match. Why? Because it doesn't have the re.VERBOSE flag, so the re.search function is treating the pattern as a compact regular expression, with significant whitespace and literal hash marks. Python can't auto-detect whether a regular expression is verbose or not. Python assumes every regular expression is compact unless you explicitly state that it is verbose.

该模式不匹配。为什么?这是因为它没有re.VERBOSE标志,因此re.search函数将该模式视作一个简明正则表达式,该表达式含有有意义的空格以及hash标志。Python不能自动检测一个正则表达式是不是简明表达式。Python认为每一个表达式都是简明表达式,除非该表达式显式的声明它是Verbose表达式。

阅读(986) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~