无聊之人--除了技术,还是技术,你懂得
分类: Python/Ruby
2011-08-20 16:26:39
7.5. Verbose Regular Expressions
详细的正则表达式
So far you've just been dealing with what I'll call “compact” regular expressions. As you've seen, they are difficult to read, and even if you figure out what one does, that's no guarantee that you'll be able to understand it six months later. What you really need is inline documentation.
现在你已经能处理我所谓的简明(compact)的正则表达式。正如你所看到的,它们阅读起来比较困难,即使你揣测出某人的意图,也不能保证六个月后你还能读懂。你真正所需要的是内联文档。
Python allows you to do this with something called verbose regular expressions. A verbose regular expression is different from a compact regular expression in two ways:
Python允许你使用Verbose正则表达式来实现这种效果。一个Verbose表达式在两个方面不同于简明正则表达式:
This will be more clear with an example. Let's revisit the compact regular expression you've been working with, and make it a verbose regular expression. This example shows how.
使用一个列子来说明可能会更清晰。我们再次浏览先前你使用的简明正则表达式,然后将它转变成verbose正则表达式。例子如下。
Example 7.9. Regular Expressions with Inline Comments
例7.9 使用行内注释的正则表达式
The most important thing to remember when using verbose regular expressions is that you need to pass an extra argument when working with them: re.VERBOSE is a constant defined in the re module that signals that the pattern should be treated as a verbose regular expression. As you can see, this pattern has quite a bit of whitespace (all of which is ignored), and several comments (all of which are ignored). Once you ignore the whitespace and the comments, this is exactly the same regular expression as you saw in the previous section, but it's a lot more readable. 你所需要记住的最重要的事就是当你在使用verbose表达式的时候,你需要传递一个额外的参数好让它工作:re.VERBOSE,它是一个定义在re模块中的常量。该常量表明模式应该被认为是verbose正则表达式。正如你所看到的,该模式含有许多空格(所有的空格都被忽略),以及几个注释(所有的注释都被忽略)。一旦你忽略了空格和注释,你看到的表达式同你在前一部分看到的一模一样,但是它确更加可读。 |
|
This matches the start of the string, then one of a possible four M, then CM, then L and three of a possible three X, then IX, then the end of the string. 该模式匹配字符串的开始,一个M,接着是CM,一个L以及三个X,IX,最后匹配字符串的结尾。 |
|
This matches the start of the string, then four of a possible four M, then D and three of a possible three C, then L and three of a possible three X, then V and three of a possible three I, then the end of the string. 该模式从字符串的开始进行匹配,接着是4个M,一个D,三个C,一个L,三个X,一个V以及三个I,最后匹配到字符串的结尾。 |
|
This does not match. Why? Because it doesn't have the re.VERBOSE flag, so the re.search function is treating the pattern as a compact regular expression, with significant whitespace and literal hash marks. Python can't auto-detect whether a regular expression is verbose or not. Python assumes every regular expression is compact unless you explicitly state that it is verbose. |
该模式不匹配。为什么?这是因为它没有re.VERBOSE标志,因此re.search函数将该模式视作一个简明正则表达式,该表达式含有有意义的空格以及hash标志。Python不能自动检测一个正则表达式是不是简明表达式。Python认为每一个表达式都是简明表达式,除非该表达式显式的声明它是Verbose表达式。