Chinaunix首页 | 论坛 | 博客
  • 博客访问: 146489
  • 博文数量: 28
  • 博客积分: 1646
  • 博客等级: 上尉
  • 技术积分: 405
  • 用 户 组: 普通用户
  • 注册时间: 2007-03-12 14:28
文章分类

全部博文(28)

文章存档

2013年(28)

我的朋友

分类: HADOOP

2013-03-28 14:28:08

 

Pattern Name

Counting with Counters

Category

Summarization Patterns

Description

This pattern utilizes the MapReduce framework’s counters utility to calculate a global sum entirely on the map side without producing any output.

Intent

An efficient means to retrieve count summarizations of large data sets.

Motivation

This pattern describes how to utilize these custom counters to gather count or summarization metrics from your data sets. The major benefit of using counter is all the counting can be done during the map phase.

Applicability

Counting with counters should be used when:

?   You have a desire to gather counts of summations over large data sets.

?   The number of counters you are going to create is small – in the double digits.

Structure

?   The Mapper processes each input record at a time to increment counters based on certain criteria. These counters are then aggregated by the TaskTrackers running the tasks and incrementally reported to the JobTracker for overall aggregation upon job success. The counters from any failed tasks are disregarded by the JobTracker in the final summation.

?   As this job is map only, there is no combiner, partitioner, or reducer required.

Consequences

The final output is a set of counters grabbed from the job framework. There is no actual output from the analytic itself. However, the job requires an output directory to execute. This directory will exist and contain a number of empty part files equivalent to the number of map tasks. This directory should be deleted on job completion.

Known uses

Count number of records

Count a small number of unique instances

Summations

Resemblances

 

Performance analysis

Using counters is very fast, as data is simply read in through the mapper and no output is written. Performance depends largely on the number of map tasks being executed and how much time it takes to process each record.

Examples

Number of users per state

 

阅读(1762) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~