Chinaunix首页 | 论坛 | 博客
  • 博客访问: 145632
  • 博文数量: 28
  • 博客积分: 1646
  • 博客等级: 上尉
  • 技术积分: 405
  • 用 户 组: 普通用户
  • 注册时间: 2007-03-12 14:28
文章分类

全部博文(28)

文章存档

2013年(28)

我的朋友

分类: HADOOP

2013-04-10 08:48:14

Pattern Name

External Source Output

Category

Input and Output Patterns

Description

The external source output pattern writes data to a system outside of Hadoop and HDFS.

Intent

You want to write MapReduce output to a nonnative location.

Motivation

The pattern skips storing data in a file system entirely and sends output key/value pairs directly where they belong. MapReduce is rarely ever hosting an applications as-is, so using MapReduce to bulk load into an external source in parallel has its uses.

In a MapReduce approach, the data is written out in parallel. As with using an external source for input, you need to be sure the destination system can handle the parallel ingest it is bound to endure with all the open connections.

Applicability

 

Structure

>The OutputFormat verifies the output specification of the job configuration prior to job submission. This method also is responsible for creating and initializing a RecordWriter implementation.

>The RecordWriter writes all key/value pairs to the external source. During construction of the object, establish any needed connections using the external source’s API. These connections are then used to write out all the data from each map or reduce task.

Consequences

The output data has been sent to the external source and that external source has loaded it successfully.

Known uses

 

Resemblances

 

Performance analysis

From a MapReduce perspective, there isn’t much to worry about since the map and reduce are generic. However, you do have to be very careful that the receiver of the data can handle the parallel connections.

Examples

Writing to Redis instances

阅读(2129) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~