Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1798320
  • 博文数量: 335
  • 博客积分: 4690
  • 博客等级: 上校
  • 技术积分: 4341
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-08 21:38
个人简介

无聊之人--除了技术,还是技术,你懂得

文章分类

全部博文(335)

文章存档

2016年(29)

2015年(18)

2014年(7)

2013年(86)

2012年(90)

2011年(105)

分类: Python/Ruby

2011-09-08 21:50:23

8.9. Putting it all together

It's time to put everything you've learned so far to good use. I hope you were paying attention.

是时候将你所学的所有知识运用到实践中去了。我希望你注意了!

Example 8.20. The translate function, part 1

8.20 转换函数 第一部分

  1. def translate(url, dialectName="chef"):
  2.     import urllib
  3.     sock = urllib.urlopen(url)
  4.     htmlSource = sock.read()
  5.     sock.close()

                

The translate function has an optional argument dialectName, which is a string that specifies the dialect you'll be using. You'll see how this is used in a minute.

转换函数包含一个可选函数:dialectName,类型是字符串类型,它说明了你将使用的字符串的类型。稍后你将看到它是如何被使用的。

Hey, wait a minute, there's an import statement in this function! That's perfectly legal in Python. You're used to seeing import statements at the top of a program, which means that the imported module is available anywhere in the program. But you can also import modules within a function, which means that the imported module is only available within the function. If you have a module that is only ever used in one function, this is an easy way to make your code more modular. (When you find that your weekend hack has turned into an 800-line work of art and decide to split it up into a dozen reusable modules, you'll appreciate this.)

嘿,稍等一下,函数在这里有一个import语句!这在Python中是完全合法的。先前你看到的import语句都是在程序的开始处,它意味着被导入的模块在程序的所有地方都是可用的。但是你同样可以在函数的内部导入模块,它说明导入模块仅在函数内可以使用。如果你的某个模块仅在函数内使用,这是一种非常简单的让你的模块模块化的方式。(当你发现你的周末工作变成了一个800行代码的艺术品,并决定将它分成一系列可重用的模块,你会很欣赏这种方式的。

Now you get the source of the given URL.

现在你获得了给定url的源代码。

Example 8.21. The translate function, part 2: curiouser and curiouser

例,8.21 转换函数 第二部分:奇怪真奇怪

  1. parserName = "%sDialectizer" % dialectName.capitalize()
  2.     parserClass = globals()[parserName]
  3.     parser = parserClass()

                           

capitalize is a string method you haven't seen before; it simply capitalizes the first letter of a string and forces everything else to lowercase. Combined with some string formatting, you've taken the name of a dialect and transformed it into the name of the corresponding Dialectizer class. If dialectName is the string 'chef', parserName will be the string 'ChefDialectizer'.

Capitalize是一个字符串方法,早先并没有介绍过,它的作用是将字符串的首字母转换成大写形式,然后强制将其余的所有字符转换成小写形式。同字符串格式化一起使用后,你接受dialect做参数,然后将它转换对应Dialect类方法的名字。如果DialectName的值为‘chef’,parserName的值将是ChefDialect

You have the name of a class as a string (parserName), and you have the global namespace as a dictionary (globals()). Combined, you can get a reference to the class which the string names. (Remember, classes are objects, and they can be assigned to variables just like any other object.) If parserName is the string 'ChefDialectizer', parserClasswill be the class ChefDialectizer.

你的类名字是为字符串形式的,你将全局命名空间作为一个字典(globals().综合后你将获得对该类的一个引用,该类是通过字符串来限定的。(记住类也是对象,同其它任意对象一样,它们也可以赋值给变量。)如果parserName的字符串值是 ‘Chefdialectizer‘,parserClass的值将会是 ChefDialectizer.

Finally, you have a class object (parserClass), and you want an instance of the class. Well, you already know how to do that: call the class like a function. The fact that the class is being stored in a local variable makes absolutely no difference; you just call the local variable like a function, and out pops an instance of the class. If parserClassis the class ChefDialectizer, parser will be an instance of the class ChefDialectizer.

最后,你获得一个类对象(parserclass,你想获得该类的一个实例。哦,你早已经知道如何来实现它了:像函数一样来调用类。当类是存储在局部变量中是的时候,事实上类和函数是没有任何区别的;同调用函数一样,你可以调用本地遍历,然后获得一个该类的实例。如果parserClass是类ChefDialectizerparser将是类ChefDialectizer的实例。

Why bother? After all, there are only 3 Dialectizer classes; why not just use a case statement? (Well, there's no case statement in Python, but why not just use a series of ifstatements?) One reason: extensibility. The translate function has absolutely no idea how many Dialectizer classes you've defined. Imagine if you defined a new FooDialectizertomorrow; translate would work by passing 'foo' as the dialectName.

担心什么?毕竟,只有三个Dialectizer类:为什么不适用case语句呢?(哦,在Python中没有case语句,但是为什么不使用一系列的 if语句呢?一个原因就是:可扩张性。Translate对你所定义的Dialectizer类没有概念。想象一下,明天你定义了一个想你的FooDealectizertranslate将会将‘foo’作为dialectName的值传递进去继续工作。

Even better, imagine putting FooDialectizer in a separate module, and importing it with from module import. You've already seen that this includes it in globals(), so translatewould still work without modification, even though FooDialectizer was in a separate file.

更好的例子是,想象将FooDialectizer放在一个单独的模块,然后使用from module import 将它导入。你已经在global()中看到了,因此translate仍然会工作,而无需修改,即使FooDialectizer在一个单独的文件。

Now imagine that the name of the dialect is coming from somewhere outside the program, maybe from a database or from a user-inputted value on a form. You can use any number of server-side Python scripting architectures to dynamically generate web pages; this function could take a URL and a dialect name (both strings) in the query string of a web page request, and output the “translated” web page.

现在想象一下,dialect的名字来自程序外部的某个地方,或许来自一个数据库或许是用户表单的输入。你可以使用任意数量的服务器端脚本结构来动态的生成web网页;该函数接受从web网页请求中的查询字符串中的URL和Dialect名字(都是字符串形式)参数,然后输入“translate”后的web页。

Finally, imagine a Dialectizer framework with a plug-in architecture. You could put each Dialectizer class in a separate file, leaving only the translate function in dialect.py. Assuming a consistent naming scheme, the translate function could dynamic import the appropiate class from the appropriate file, given nothing but the dialect name. (You haven't seen dynamic importing yet, but I promise to cover it in a later chapter.) To add a new dialect, you would simply add an appropriately-named file in the plug-ins directory (like foodialect.py which contains the FooDialectizer class). Calling the translate function with the dialect name 'foo' would find the module foodialect.py, import the class FooDialectizer, and away you go.

最好想象一下,Dialectize框架以及一个可插入的体系。你可以将每一个Dialectizer定义在单独的文件,只在dialect.py中保留translate函数。假定一个一致的命名模式,translate函数就能动态的从合适的文件导入合适的类,仅仅只需要dialect名字而不需要其它的任何参数。(你还没有看见动态导入,但是我可以保证稍后我们将讲解它。为了增加一个新的dialect,你可以简单的在可插入目录中增加一个合适命名的文件(如foodialect.py包含了 FooDialect类)。使用dialect参数 foo’调用translate函数将会查找foodialect.py模块,导入类FooDialect,这样你就可以做你想做的。

Example 8.22. The translate function, part 3

8.22 translate函数 第三部分

 

  1. parser.feed(htmlSource)
  2.     parser.close()
  3.     return parser.output()

After all that imagining, this is going to seem pretty boring, but the feed function is what does the entire transformation. You had the entire HTML source in a single string, so you only had to call feed once. However, you can call feed as often as you want, and the parser will just keep parsing. So if you were worried about memory usage (or you knew you were going to be dealing with very large HTML pages), you could set this up in a loop, where you read a few bytes of HTML and fed it to the parser. The result would be the same.

毕竟你还得想象一下,这看起来十分枯燥,但是feed函数就是整个translation函数所做的。你在一个简单的字符串中包含了全部的html源代码,因此你只需调用feed一次。然而,你可以随心所欲的调用feedparser将会继续解析。如果你担心内存使用(或是你知道将会处理非常的html页),你可以将这个调用放在一个循环中,这样就可以读取少量的字节然后输入给parser。结果是一样的。

Because feed maintains an internal buffer, you should always call the parser's close method when you're done (even if you fed it all at once, like you did). Otherwise you may find that your output is missing the last few bytes.

因为feed占有内部缓冲,当你在完成的时候,你需要每次都调用parser‘的close方法。(即使你是一次性的将数据导入给parser,你也应该这样做)。否则,你会发现你的输出会缺少最后的少量字节。

Remember, output is the function you defined on BaseHTMLProcessor that joins all the pieces of output you've buffered and returns them in a single string.

记住,输出是你定义BaseHTMLProcessor中的函数,它将连接你所缓冲的输出片段然后以一个字符串的形式返回。

And just like that, you've “translated” a web page, given nothing but a URL and the name of a dialect.

Put it all together 汇总,就绪

阅读(1075) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~