2013年(28)
分类: HADOOP
2013-03-28 14:28:08
Pattern Name |
Counting with Counters |
Category |
Summarization Patterns |
Description |
This pattern utilizes the MapReduce framework’s counters utility to calculate a global sum entirely on the map side without producing any output. |
Intent |
An efficient means to retrieve count summarizations of large data sets. |
Motivation |
This pattern describes how to utilize these custom counters to gather count or summarization metrics from your data sets. The major benefit of using counter is all the counting can be done during the map phase. |
Applicability |
Counting with counters should be used when: ? You have a desire to gather counts of summations over large data sets. ? The number of counters you are going to create is small – in the double digits. |
Structure |
? The Mapper processes each input record at a time to increment counters based on certain criteria. These counters are then aggregated by the TaskTrackers running the tasks and incrementally reported to the JobTracker for overall aggregation upon job success. The counters from any failed tasks are disregarded by the JobTracker in the final summation. ? As this job is map only, there is no combiner, partitioner, or reducer required. |
Consequences |
The final output is a set of counters grabbed from the job framework. There is no actual output from the analytic itself. However, the job requires an output directory to execute. This directory will exist and contain a number of empty part files equivalent to the number of map tasks. This directory should be deleted on job completion. |
Known uses |
Count number of records Count a small number of unique instances Summations |
Resemblances |
|
Performance analysis |
Using counters is very fast, as data is simply read in through the mapper and no output is written. Performance depends largely on the number of map tasks being executed and how much time it takes to process each record. |
Examples |
Number of users per state |