hadoop实战--单词串的统计-zzywolf-ChinaUnix博客

sharezzywolf.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

zzywolf

博客访问： 254814
博文数量： 40
博客积分： 713
博客等级：军士长
技术积分： 437
用户组：普通用户
注册时间： 2011-11-06 19:16

个人简介

努力不一定成功但放弃就一定失败

文章分类

全部博文（40）

朋友挺好哦你（2）
mysql（1）
kvm（1）
计网协议（0）
hadoop及其组件（8）
xen（1）
linux服务器（1）
shell（1）
学习读本（1）
虚拟化学习（5）
linux系统安装（2）
启迪（1）
操作系统基础（3）
未分配的博文（13）

文章存档

2012年（32）

2011年（8）

我的朋友

最近访客

推荐博文

hadoop实战--单词串的统计

分类：

2012-09-25 10:47:25

原文地址：hadoop实战--单词串的统计作者：liurhyme

hadoop实战--单词串的统计

1.运行简单计数程序

首先准备两个文本文件，在命令行中输入执行命令：

echo "hello hadoop word count">/tmp/test_file1.txt

echo "hello hadoop,I'm a vegetable bird">/tmp/test_file2.txt

将两个文件复制到dfs里，执行命令

bin/hadoop dfs -mkdir test-in （创建文件夹test-in）

bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in （复制两文件到test-in）

bin/hadoop dfs -ls test-in （查看是否复制成功）显示如下列表：

Found 2 items
-rw-r--r-- 1 hadoop supergroup 24 2011-01-21 18:40 /user/hadoop/test-in/test_file1.txt
-rw-r--r-- 1 hadoop supergroup 34 2011-01-21 18:40 /user/hadoop/test-in/test_file2.txt

注：这里的test-in其实是HDFS路径下的目录，绝对路径为“hdfs://localhost:9000/user/hadoop/test-in”

运行示例，执行如下命令

bin/hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount test-in test-out （将生成结果输出到test-out）屏幕显示：

11/01/21 18:50:16 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/01/21 18:50:17 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
11/01/21 18:50:17 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/01/21 18:50:17 INFO input.FileInputFormat: Total input paths to process : 2
11/01/21 18:50:17 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
11/01/21 18:50:17 INFO mapreduce.JobSubmitter: number of splits:2
11/01/21 18:50:18 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
11/01/21 18:50:18 INFO mapreduce.Job: Running job: job_201101211705_0001
11/01/21 18:50:19 INFO mapreduce.Job: map 0% reduce 0%
11/01/21 18:50:35 INFO mapreduce.Job: map 100% reduce 0%
11/01/21 18:50:44 INFO mapreduce.Job: map 100% reduce 100%
11/01/21 18:50:47 INFO mapreduce.Job: Job complete: job_201101211705_0001
11/01/21 18:50:47 INFO mapreduce.Job: Counters: 33
FileInputFormatCounters
BYTES_READ=58
FileSystemCounters
FILE_BYTES_READ=118
FILE_BYTES_WRITTEN=306
HDFS_BYTES_READ=300
HDFS_BYTES_WRITTEN=68
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Job Counters
Data-local map tasks=2
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
SLOTS_MILLIS_MAPS=22290
SLOTS_MILLIS_REDUCES=6539
Launched map tasks=2
Launched reduce tasks=1
Map-Reduce Framework
Combine input records=9
Combine output records=9
Failed Shuffles=0
GC time elapsed (ms)=642
Map input records=2
Map output bytes=94
Map output records=9
Merged Map outputs=2
Reduce input groups=8
Reduce input records=9
Reduce output records=8
Reduce shuffle bytes=124
Shuffled Maps =2
Spilled Records=18
SPLIT_RAW_BYTES=242

查看执行结果：

bin/hadoop dfs -ls test-out 显示：

Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2011-01-21 18:50 /user/hadoop/test-out/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 68 2011-01-21 18:50 /user/hadoop/test-out/part-r-00000

查看最终统计结果：（执行命令）

bin/hadoop dfs -cat test-out/part-r-00000 显示统计结果，统计了每次词在文件中出现的次数

a 1
bird 1
count 1
hadoop 1
hadoop,I'm 1
hello 2
vegetable 1
word 1

阅读(1976) | 评论(0) | 转发(0) |

上一篇：error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or director

下一篇：搭建hadoop集群 datanode无法启动的原因

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6