Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1792794
  • 博文数量: 335
  • 博客积分: 4690
  • 博客等级: 上校
  • 技术积分: 4341
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-08 21:38
个人简介

无聊之人--除了技术,还是技术,你懂得

文章分类

全部博文(335)

文章存档

2016年(29)

2015年(18)

2014年(7)

2013年(86)

2012年(90)

2011年(105)

分类: Python/Ruby

2011-08-19 18:20:11

7.4. Using the {n,m} Syntax

In the previous section, you were dealing with a pattern where the same character could be repeated up to three times. There is another way to express this in regular expressions, which some people find more readable. First look at the method we already used in the previous example.

在前一部分,你在处理模式中某些相同的字符可能会出现到三次。同时还存在另一种方式来表示该正则表达式,这种方式某些人认为更加具有可读性。我们首先看看我们先前例子中说使用的方法。

Example 7.5. The Old Way: Every Character Optional

7.5 老方法:每一个字符都是可选的

  1. >>> import re
  2. >>> pattern = '^M?M?M?$'
  3. >>> re.search(pattern, 'M')
  4. <_sre.SRE_Match object at 0x008EE090>
  5. >>> pattern = '^M?M?M?$'
  6. >>> re.search(pattern, 'MM')
  7. <_sre.SRE_Match object at 0x008EEB48>
  8. >>> pattern = '^M?M?M?$'
  9. >>> re.search(pattern, 'MMM')
  10. <_sre.SRE_Match object at 0x008EE090>
  11. >>> re.search(pattern, 'MMMM')
  12. >>>

1

This matches the start of the string, and then the first optional M, but not the second and third M (but that's okay because they're optional), and then the end of the string.

该模式匹配字符串的开始,接着是是一个可选的M,但是第二个和第三个都不匹配(但是这是没有问题,因为它们本身都是可选的),然后字符串就结束。

2

This matches the start of the string, and then the first and second optional M, but not the third M (but that's okay because it's optional), and then the end of the string.

该模式匹配字符串的开始,接着匹配第一和第二个可选的M,但是不匹配第三个M(但是这也是没有问题的,这是因为第三个M也是可选的),然后是字符串的结束。

3

This matches the start of the string, and then all three optional M, and then the end of the string.

该模式也是匹配字符串的开始处,然后是三个可选的M,然后字符串结束。

4

This matches the start of the string, and then all three optional M, but then does not match the the end of the string (because there is still one unmatched M), so the pattern does not match and returns None.

该模式也是匹配字符串的开始,然后是三个可选的M,但是不匹配最后字符串的结束(因为还存在一个为匹配的M),因此模式是不匹配的,从而返回NONE

Example 7.6. The New Way: From n o m

7.6 新方式:从nm

  1. >>> pattern = '^M{0,3}$'
  2. >>> re.search(pattern, 'M')
  3. <_sre.SRE_Match object at 0x008EEB48>
  4. >>> re.search(pattern, 'MM')
  5. <_sre.SRE_Match object at 0x008EE090>
  6. >>> re.search(pattern, 'MMM')
  7. <_sre.SRE_Match object at 0x008EEDA8>
  8. >>> re.search(pattern, 'MMMM')
  9. >>>

1

This pattern says: “Match the start of the string, then anywhere from zero to three M characters, then the end of the string.” The 0 and 3 can be any numbers; if you want to match at least one but no more than three M characters, you could say M{1,3}.

该模式的意思是说,它 从字符串的开始进行匹配,接着是可以匹配0-3M,接着整个模式就结束了。其中0-3中可以是其中的任何一个:如果你打算匹配至少一个,但是不能超过3个,你可以使用M{1,3}模式。

 

 

2

This matches the start of the string, then one M out of a possible three, then the end of the string.

该模式也是从字符串的开始进行匹配,其中一个M是三种情况中的一个,满足模式,然后是字符串结束。

 

 

3

This matches the start of the string, then two M out of a possible three, then the end of the string.

该模式也是从字符串的开始进行匹配,两个M也满足情况,然后字符串结束。

 

 

4

This matches the start of the string, then three M out of a possible three, then the end of the string.

该模式也是从字符串的开始进行匹配,三个M也是其中的一种情况,然后字符串结束。

 

 

5

This matches the start of the string, then three M out of a possible three, but then does not match the end of the string. The regular expression allows for up to only three Mcharacters before the end of the string, but you have four, so the pattern does not match and returns None.

该模式也是从字符串的开始进行匹配,然后是3M也是满足情况,但是不满足字符串应该结束的要求。正则表达式允许最多3M在字符串结束以前,但是对于4M,模式就不在匹配,然后返回None

 

 

 

Note

 

 

There is no way to programmatically determine that two regular expressions are equivalent. The best you can do is write a lot of test cases to make sure they behave the same way on all relevant inputs. You'll talk more about writing test cases later in this book.

没有任何方法从语法的角度来确定这两种正则表达式是等价的。最好的方法就是你写尽可能多的测试案例,来确保对于所有相关的输出,模式的输出是一样的。在本书的后面章节将会涉及更多的如何写测试案例。

 

 

7.4.1. Checking for Tens and Ones

7.4.1 校验十位和各位

Now let's expand the Roman numeral regular expression to cover the tens and ones place. This example shows the check for tens.

接着让我们扩展罗马数字的正则表达式来包含十位和各位。下面的例子展示了如何校验十位

Example 7.7. Checking for Tens

  1. >>> pattern = '^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)$'
  2. >>> re.search(pattern, 'MCMXL')
  3. <_sre.SRE_Match object at 0x008EEB48>
  4. >>> re.search(pattern, 'MCML')
  5. <_sre.SRE_Match object at 0x008EEB48>
  6. >>> re.search(pattern, 'MCMLX')
  7. <_sre.SRE_Match object at 0x008EEB48>
  8. >>> re.search(pattern, 'MCMLXXX')
  9. <_sre.SRE_Match object at 0x008EEB48>
  10. >>> re.search(pattern, 'MCMLXXXX')
  11. >>>

1

This matches the start of the string, then the first optional M, then CM, then XL, then the end of the string. Remember, the (A|B|C) syntax means “match exactly one of A, B, or C”. You match XL, so you ignore the XC and L?X?X?X? choices, and then move on to the end of the string. MCML is the Roman numeral representation of 1940.

2

This matches the start of the string, then the first optional M, then CM, then L?X?X?X?. Of the L?X?X?X?, it matches the L and skips all three optional X characters. Then you move to the end of the string. MCML is the Roman numeral representation of 1950.

该模式从字符串的开始就匹配,接着是第一个可选的M,接着是CM,接着L?X?X?X?。对于L?X?X?X?,它匹配L接着忽略了三个可选的X。接着匹配到了字符串的结尾。MCML在罗马数字中代表着1950.

3

This matches the start of the string, then the first optional M, then CM, then the optional L and the first optional X, skips the second and third optional X, then the end of the string. MCMLX is the Roman numeral representation of 1960.

该模式也是从字符串的开始进行匹配,接着匹配第一个可选的M,然后是CM,接着是可选的L,然后是第一个可选的X,跳过第二个和第三个可选的X,接着匹配字符串的结尾。罗马数字MCMLX代表的是1960.

4

This matches the start of the string, then the first optional M, then CM, then the optional L and all three optional X characters, then the end of the string. MCMLXXX is the Roman numeral representation of 1980.

该模式也是从字符串的开始进行匹配,接着是第一个可选的M,然后是CM,接着是可选的L,然后是三个可选的X,左后匹配到字符串的末尾。罗马数字MCMLXXX 1980.

5

This matches the start of the string, then the first optional M, then CM, then the optional L and all three optional X characters, then fails to match the end of the string because there is still one more X unaccounted for. So the entire pattern fails to match, and returns None. MCMLXXXX is not a valid Roman numeral.

该模式也是从字符串的开始进行匹配,接着是第一个可选的M,然后是CM,接着是可选的L,然后是三个可选的X,最后却不匹配字符串的结尾,这是因为最后还有一个X不能匹配。因此整个模式匹配失败,然后返回NoneMCMLXXXX不是一个合法的罗马数字。

The expression for the ones place follows the same pattern. I'll spare you the details and show you the end result.

对于各位,表达式遵循同样的模式,我将与你共享这些细节,然后给出最后的结果。

>>> pattern = '^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$'

So what does that look like using this alternate {n,m} syntax? This example shows the new syntax.

那么使用这种可选的{n,m}方式的表达式看起来会是什么样呢?下面的例子展示了使用新的语法所构成的表达式。

Example 7.8. Validating Roman Numerals with {n,m}

7.8 使用{n,m}来校验罗马数字

  1. >>> pattern = '^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'
  2. >>> re.search(pattern, 'MDLV')
  3. <_sre.SRE_Match object at 0x008EEB48>
  4. >>> re.search(pattern, 'MMDCLXVI')
  5. <_sre.SRE_Match object at 0x008EEB48>
  6. >>> re.search(pattern, 'MMMMDCCCLXXXVIII')
  7. <_sre.SRE_Match object at 0x008EEB48>
  8. >>> re.search(pattern, 'I')
  9. <_sre.SRE_Match object at 0x008EEB48>

1

This matches the start of the string, then one of a possible four M characters, then D?C{0,3}. Of that, it matches the optional D and zero of three possible C characters. Moving on, it matches L?X{0,3} by matching the optional L and zero of three possible X characters. Then it matches V?I{0,3} by matching the optional V and zero of three possible I characters, and finally the end of the string. MDLV is the Roman numeral representation of 1555.

该模式从字符串的开始就匹配,然后匹配一个M,接着D?C{0,3}中,,匹配D,然后忽略了所有的C。在继续匹配的时候,L?X{0,3}模式匹配一个可选的L,忽略所有的X。在对V?I{0,3} 进行匹配时,V能匹配,然后I0,最后匹配到字符串结束。MDLV在罗马数字为1555.

2

This matches the start of the string, then two of a possible four M characters, then the D?C{0,3} with a D and one of three possible C characters; then L?X{0,3} with an L and one of three possible X characters; then V?I{0,3} with a V and one of three possible I characters; then the end of the string. MMDCLXVI is the Roman numeral representation of 2666.

该模式也是从字符串的开始进行匹配,匹配两个M,对D?C{0,3},匹配一个D,一个C;对于L?X{0,3} 模式,也是匹配一个L一个X;最后是V?I{0,3} 匹配一个V,一个I,然后匹配到字符串结束。罗马数字MMDCLXVI2666. 

3

This matches the start of the string, then four out of four M characters, then D?C{0,3} with a D and three out of three C characters; then L?X{0,3} with an L and three out of three X characters; then V?I{0,3} with a V and three out of three I characters; then the end of the string. MMMMDCCCLXXXVIII is the Roman numeral representation of 3888, and it's the longest Roman numeral you can write without extended syntax.

该模式也是匹配了字符串的开始,即使4M完全匹配,接着D?C{0,3}匹配一个D,三个CL?X{0,3} 匹配一个L,然后是3L;接着对V?I{0,3}进行匹配,匹配一个V和三个I;接着就匹配到了字符串的结束。罗马数字MMMMDCCCLXXXVIII 代表3888,它是你在不适用扩展语法的情况下所能构造的最长的罗马数字。

4

Watch closely. (I feel like a magician. “Watch closely, kids, I'm going to pull a rabbit out of my hat.”) This matches the start of the string, then zero out of four M, then matches D?C{0,3} by skipping the optional D and matching zero out of three C, then matches L?X{0,3} by skipping the optional L and matching zero out of three X, then matches V?I{0,3} by skipping the optional V and matching one out of three I. Then the end of the string. Whoa.

仔细一点。(我感觉它像一个魔术“看仔细一点,孩子们,我将从我的帽子中拿出一只兔子“)。该模式也是从字符串的开始就匹配,所有的M都不匹配,接着D?C{0,3}中,D为可选被忽略掉,C也被忽略掉; L?X{0,3}在忽略掉L后,对多有的X也都忽略;V?I{0,3} 跳过可选的V,然后匹配了一个I,然后匹配到字符串的。哇偶。

If you followed all that and understood it on the first try, you're doing better than I did. Now imagine trying to understand someone else's regular expressions, in the middle of a critical function of a large program. Or even imagine coming back to your own regular expressions a few months later. I've done it, and it's not a pretty sight.

如果你理解了全部的思路并在第一次尝试的时候理解它,你做的会比我好。现在想想一下,当你尝试理解他人写的正则表达式,该表达式在一个大程序关键函数的中间。或是想象一下,几个月以后回过头来看你自己的正则表达式。这样的事我做过,这不是也好差事。

In the next section you'll explore an alternate syntax that can help keep your expressions maintainable.

在下一部分,我们会探究另外一种语法,它能帮助你让你的表达式更加可维护。

阅读(1189) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~