2013年(28)
分类: HADOOP
2013-04-10 08:48:14
Pattern Name |
External Source Output |
Category |
Input and Output Patterns |
Description |
The external source output pattern writes data to a system outside of Hadoop and HDFS. |
Intent |
You want to write MapReduce output to a nonnative location. |
Motivation |
The pattern skips storing data in a file system entirely and sends output key/value pairs directly where they belong. MapReduce is rarely ever hosting an applications as-is, so using MapReduce to bulk load into an external source in parallel has its uses. In a MapReduce approach, the data is written out in parallel. As with using an external source for input, you need to be sure the destination system can handle the parallel ingest it is bound to endure with all the open connections. |
Applicability |
|
Structure |
>The OutputFormat verifies the output specification of the job configuration prior to job submission. This method also is responsible for creating and initializing a RecordWriter implementation. >The RecordWriter writes all key/value pairs to the external source. During construction of the object, establish any needed connections using the external source’s API. These connections are then used to write out all the data from each map or reduce task. |
Consequences |
The output data has been sent to the external source and that external source has loaded it successfully. |
Known uses |
|
Resemblances |
|
Performance analysis |
From a MapReduce perspective, there isn’t much to worry about since the map and reduce are generic. However, you do have to be very careful that the receiver of the data can handle the parallel connections. |
Examples |
Writing to Redis instances |