Hadoop初步编程-phoenixrising-ChinaUnix博客

phoenixrisingphoenix.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

phoenixrising

博客访问： 101467
博文数量： 19
博客积分： 807
博客等级：上士
技术积分： 260
用户组：普通用户
注册时间： 2010-12-17 10:28

文章分类

全部博文（19）

JAVA语言（2）
衣食住行（2）
分布式（6）

google论文（4）
Web开发（4）
我的Ubuntu（4）
未分配的博文（1）

文章存档

2011年（19）

我的朋友

小雅贝贝

相关博文

Hadoop初步编程

分类：云计算

2011-01-20 11:21:28

1.运行简单计数程序

首先准备两个文本文件，在命令行中输入执行命令：

echo "hello hadoop word count">/tmp/test_file1.txt

echo "hello hadoop,I'm a vegetable bird">/tmp/test_file2.txt

将两个文件复制到dfs里，执行命令

bin/hadoop dfs -mkdir test-in （创建文件夹test-in）

bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in （复制两文件到test-in）

bin/hadoop dfs -ls test-in （查看是否复制成功）显示如下列表：

Found 2 items
-rw-r--r-- 1 hadoop supergroup 24 2011-01-21 18:40 /user/hadoop/test-in/test_file1.txt
-rw-r--r-- 1 hadoop supergroup 34 2011-01-21 18:40 /user/hadoop/test-in/test_file2.txt

注：这里的test-in其实是HDFS路径下的目录，绝对路径为“hdfs://localhost:9000/user/hadoop/test-in”

运行示例，执行如下命令

bin/hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount test-in test-out （将生成结果输出到test-out）屏幕显示：

11/01/21 18:50:16 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/01/21 18:50:17 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
11/01/21 18:50:17 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/01/21 18:50:17 INFO input.FileInputFormat: Total input paths to process : 2
11/01/21 18:50:17 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
11/01/21 18:50:17 INFO mapreduce.JobSubmitter: number of splits:2
11/01/21 18:50:18 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
11/01/21 18:50:18 INFO mapreduce.Job: Running job: job_201101211705_0001
11/01/21 18:50:19 INFO mapreduce.Job: map 0% reduce 0%
11/01/21 18:50:35 INFO mapreduce.Job: map 100% reduce 0%
11/01/21 18:50:44 INFO mapreduce.Job: map 100% reduce 100%
11/01/21 18:50:47 INFO mapreduce.Job: Job complete: job_201101211705_0001
11/01/21 18:50:47 INFO mapreduce.Job: Counters: 33
FileInputFormatCounters
BYTES_READ=58
FileSystemCounters
FILE_BYTES_READ=118
FILE_BYTES_WRITTEN=306
HDFS_BYTES_READ=300
HDFS_BYTES_WRITTEN=68
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Job Counters
Data-local map tasks=2
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
SLOTS_MILLIS_MAPS=22290
SLOTS_MILLIS_REDUCES=6539
Launched map tasks=2
Launched reduce tasks=1
Map-Reduce Framework
Combine input records=9
Combine output records=9
Failed Shuffles=0
GC time elapsed (ms)=642
Map input records=2
Map output bytes=94
Map output records=9
Merged Map outputs=2
Reduce input groups=8
Reduce input records=9
Reduce output records=8
Reduce shuffle bytes=124
Shuffled Maps =2
Spilled Records=18
SPLIT_RAW_BYTES=242

查看执行结果：

bin/hadoop dfs -ls test-out 显示：

Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2011-01-21 18:50 /user/hadoop/test-out/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 68 2011-01-21 18:50 /user/hadoop/test-out/part-r-00000

查看最终统计结果：（执行命令）

bin/hadoop dfs -cat test-out/part-r-00000 显示统计结果，统计了每次词在文件中出现的次数

a 1
bird 1
count 1
hadoop 1
hadoop,I'm 1
hello 2
vegetable 1
word 1

参考：

https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/

https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/

https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/

http://itstarting.javaeye.com/blog/520985

阅读(2274) | 评论(1) | 转发(0) |

上一篇：Ubuntu下Eclipse开发环境搭建

下一篇：MapReduce:超大机群上的简单数据处理

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6