Chinaunix首页 | 论坛 | 博客
  • 博客访问: 146491
  • 博文数量: 28
  • 博客积分: 1646
  • 博客等级: 上尉
  • 技术积分: 405
  • 用 户 组: 普通用户
  • 注册时间: 2007-03-12 14:28
文章分类

全部博文(28)

文章存档

2013年(28)

我的朋友

分类: HADOOP

2013-03-28 14:26:40

Pattern Name

Inverted Index Summarizations

Category

Summarization Patterns

Intent

Generate an index from a data set to allow for faster searches or data enrichment capabilities.

Motivation

It is often convenient to index large data sets on keywords, so that searches can trace terms back to records that contain specific values. While building an inverted index does require extra processing up front, taking the time to do so can greatly reduce the amount of time it takes to find something.

Applicability

Inverted indexes should be used when quick search query responses are required. The results of such a query can be preprocessed and ingested into a database.

Structure

?   The Mapper outputs the desired fields for the index as the key and the unique identifier as the value.

?   Combiner can be omitted if you are just using the identity reducer.

?   The partitioner is responsible for determining where values with the same key will eventually be copied by a reducer for final output. It can be customized for more efficient load balancing if the intermediate keys are not evenly distributed.

?   The reducer will receive a set of unique record identifiers to map back to the input key.

?   The final output is a set of part files that contain a mapping of field value to a set of unique IDs of records containing the associated field value.

Consequences

The output of the job will be a set of part files containing a single record per reducer input group. Each record will consist of the key and all aggregate values.

Known uses

 

Resemblances

 

Performance analysis

The performance of building an inverted index depends mostly on the computational cost of parsing the content in the mapper, the cardinality of the index keys, and the number of content identifiers per key.

Inverted Index Examples

Wikipedia reference inverted index

阅读(1888) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~