基于python的crawler-bigluo-ChinaUnix博客

Linux is Powerbigluo.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

bigluo

博客访问： 1425305
博文数量： 343
博客积分： 13098
博客等级：上将
技术积分： 2862
用户组：普通用户
注册时间： 2005-07-06 00:35

文章分类

全部博文（343）

Web Development（2）
Python & Perl（35）
Operating System（8）
Visualization Te（106）
Miscellaneous（10）
Google Android（15）
Motorola EzX（6）
Linux Memory Mgm（10）
Embedded Develop（31）

Embedded Toolcha（5）

Embedded Linux O（9）

Embedded Java（0）

Embedded Hardwar（3）

Embedded Databas（2）

Embedded Browser（0）

Embedded UI Fram（9）

Embedded Multime（3）
C++ Programming（36）
Linux System Adm（76）
Secure Programmi（5）
未分配的博文（3）

文章存档

2012年（131）

2011年（31）

2010年（53）

2009年（23）

2008年（62）

2007年（2）

2006年（36）

2005年（5）

我的朋友

Components

Scrapy Engine

The engine is responsible for controlling the data flow between all components of the system, and triggering events when certain actions occur. See the Data Flow section below for more details.

Scheduler

The Scheduler receives requests from the engine and enqueues them for feeding them later (also to the engine) when the engine requests them.

Downloader

The Downloader is responsible for fetching web pages and feeding them to the engine which, in turns, feeds them to the spiders.

Spiders

Spiders are custom classes written by Scrapy users to parse response and extract items (aka scraped items) from them or additional URLs (requests) to follow. Each spider is able to handle a specific domain (or group of domains). For more information see Spiders.

Item Pipeline

The Item Pipeline is responsible for processing the items once they have been extracted (or scraped) by the spiders. Typical tasks include cleansing, validation and persistence (like storing the item in a database). For more information see Item Pipeline.

Downloader middlewares

Downloader middlewares are specific hooks that sit between the Engine and the Downloader and process requests when they pass from the Engine to the downloader, and responses that pass from Downloader to the Engine. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code. For more information see Downloader Middleware.

Spider middlewares

Spider middlewares are specific hooks that sit between the Engine and the Spiders and are able to process spider input (responses) and output (items and requests). They provide a convenient mechanism for extending Scrapy functionality by plugging custom code. For more information see Spider Middleware.

Scheduler middlewares

Spider middlewares are specific hooks that sit between the Engine and the Scheduler and process requests when they pass from the Engine to the Scheduler and vice-versa. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code.

阅读(1371) | 评论(0) | 转发(0) |

上一篇：An issubclass() issue

下一篇：Debugging memory leaks for Python

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6