无聊之人--除了技术,还是技术,你懂得
分类: Python/Ruby
2011-08-18 19:58:28
7.3. Case Study: Roman Numerals
You've most likely seen Roman numerals, even if you didn't recognize them. You may have seen them in copyrights of old movies and television shows (“Copyright MCMXLVI” instead of “Copyright 1946”), or on the dedication walls of libraries or universities (“established MDCCCLXXXVIII” instead of “established 1888”). You may also have seen them in outlines and bibliographical references. It's a system of representing numbers that really does date back to the ancient Roman empire (hence the name).
即使你不能分辨罗马数字,但是你应该早就看见它们了。你或许在老电影和电视的版权中看到它们(版权MCMXLVI 而不是 copyright 1946),或是在图书馆或是大学的精心制作的壁画上看到它们(建立于MDCCCLXXXVII而不是 建立于1888.你或许同样在大纲或是参考文献中看到它们。它是一个数字系统,这些系统确实可以追溯到古罗马帝国(正如它的名字一样)。
In Roman numerals, there are seven characters that are repeated and combined in various ways to represent numbers.
在罗马数字中,存在7个字符以不同的方式用来重复以及组合来代表不同的数字。
The following are some general rules for constructing Roman numerals:
下面是组建罗马数字的基本规则:
7.3.1. Checking for Thousands
7.3.1校验千位
What would it take to validate that an arbitrary string is a valid Roman numeral? Let's take it one digit at a time. Since Roman numerals are always written highest to lowest, let's start with the highest: the thousands place. For numbers 1000 and higher, the thousands are represented by a series of M characters.
校验任意一个字符串是不是合法的罗马数字该怎么办呢?让我们一个数字一个数字的来考虑。因为罗马数字总是从高位写向低位,那我也从高位开始:也就是千位。对于数字1000以及更大的数,千位是用一系列的M来表示的。
Example 7.3. Checking for Thousands
例7.3 校验千位
This pattern
has three parts: 模式由三部分组成: The essence of the re module
is the search function, that takes a regular expression (pattern)
and a string ('M') to try to match against the regular expression. If a match
is found, search returns an object which has various methods to
describe the match; if no match is found, search returns None,
the Python null value. All you care about at the moment is whether
the pattern matches, which you can tell by just looking at the return value
of search. 'M' matches this regular expression, because the
first optional M matches and the second and third
optional M characters are ignored. 整个正则表达式模块的精华就是搜索函数,它接受一个正则表达式(也就是模式)和字符串(‘M’然后尝试匹配正则表达式。如果不能匹配,则返回NONE,即Python空值。你所关心的时刻也就是模式是否匹配,通过查看搜搜的返回值来确定是否匹配。‘M“匹配这个表达式,这是因为第一个M是可选的而第二个和第三个M被忽略掉。 'MM' matches because the first and
second optional M characters match and the third M is
ignored. ‘MM‘也能匹配,这是因为第一个个第二个可选的M都能匹配,而第三个M被忽略。 'MMM' matches because all
three M characters match. ‘MMM‘能匹配是因为三个M都匹配。 'MMMM' does not match. All
three M characters match, but then the regular expression insists
on the string ending (because of the $ character), and the string
doesn't end yet (because of the fourth M). So search returns None. ‘MMMM‘不匹配。三个M是可以匹配的,但是正则表达式要求字符串必须结束(因为$),而字符串确没有结束(因为存在第四个M),因此搜索返回NONE。 Interestingly, an empty string also
matches this regular expression, since all the M characters are
optional. 有趣的是,该字符表达式确匹配一个空字符串,这是因为所有的M都是可选的。
7.3.2. Checking for Hundreds
7.3.2 校验百位
The hundreds place is more difficult than the thousands, because there are several mutually exclusive ways it could be expressed, depending on its value.
百位的校验要比千位难难很多,这是因为根据质的不同,存在好几种互斥的表达式可以表达。
So there are four possible patterns:
因此存在四种可能的模式:
The last two patterns can be combined:
后两种模式可以综合在一起:
This example shows how to validate the hundreds place of a Roman numeral.
这个例子显示了如何校验罗马数字百位的合法性。
Example 7.4. Checking for Hundreds
例7.4 校验百位
This pattern starts out the same as the
previous one, checking for the beginning of the string (^), then the thousands
place (M?M?M?). Then it has the new part, in parentheses, which defines a set
of three mutually exclusive patterns, separated by vertical
bars: CM, CD, and D?C?C?C? (which is an
optional D followed by zero to three
optional C characters). The regular expression parser checks for
each of these patterns in order (from left to right), takes the first one
that matches, and ignores the rest. 该模式的开始同先前的例子一样,使用’^’用来测试字符串的开始,然后是千位(M?M?M?).接着该模式又包含了新的部分,使用了括号定义了三组互斥模式,通过使用|来分隔CM,CD,D?C?C?C?(D是可选的,后面跟着0-3个C)。正则表达式解析器对这些模式都进行校验,以防止(从左至右),如果第一个匹配,可忽略后面的。 'MCM' matches because the
first M matches, the second and third M characters are
ignored, and the CM matches (so the CD and D?C?C?C? patterns
are never even considered).MCM is the Roman numeral representation
of 1900. ‘MCM‘是匹配的,这是因为第一个M是匹配的,第二个M和第三个M都被忽略掉,而CD 也匹配(因此CD和D?C?C?C?模式都没有被考虑)。MCM是罗马数字1900. 'MD' matches because the
first M matches, the second and third M characters are
ignored, and the D?C?C?C? pattern matches D (each of the
three C characters are optional and are ignored). MD is
the Roman numeral representation of 1500. ‘MD‘也匹配,这是因为第一个M是匹配的,第二和第三个M都被忽略。D?C?C?C?模式匹配了D(所有的C都被忽略了)MD是罗马数字1500. 'MMMCCC' matches because all
three M characters match, and the D?C?C?C? pattern
matches CCC (the D is optional and is
ignored). MMMCCC is the Roman numeral representation of 3300. ‘MMMCCC’也匹配,这是因为所有的M匹配,D?C?C?C?模式匹配了CCC(D是可选的并且被忽略),该数字在罗马中代表3300. 'MCMC' does not match. The
first M matches, the second and third M characters are
ignored, and the CM matches, but then the $ does not
match because you're not at the end of the string yet (you still have an unmatched C character).
The C does not match as part of
the D?C?C?C? pattern, because the mutually
exclusive CM pattern has already matched. ‘MCMC’不匹配。第一个M匹配,第二个和第三个M都被忽略,CM也匹配,但是$不匹配,因为它不出现在字符串的尾部(到现在都还没有匹配C),C不匹配,作为D?C?C?C?模式的一部分,这是因为C和CM是互斥的,CM早已经匹配。 Interestingly, an empty string still
matches this pattern, because all the M characters are optional and
ignored, and the empty string matches the D?C?C?C? pattern where
all the characters are optional and ignored. 同样有意思的是,该模式同样匹配空字符串,这是因为所有的M都是可选的,因而都被忽略,然后空字符串匹配D?C?C?C?模式,该模式中所有的字符串也都是可选的,都被忽略。
Whew! See how quickly regular expressions can get nasty? And you've only covered the thousands and hundreds places of Roman numerals. But if you followed all that, the tens and ones places are easy, because they're exactly the same pattern. But let's look at another way to express the pattern.
哇塞,注意到没多久正则表达式就变得很糟糕了没?而且你现在只是校验了罗马数字的百位和千位。但是如果你按照这种思路,十位和各位就变得很简单,这是因为十位和各位都具有相同的模式。但是,让我以另一种方式来表达该模式吧