一颗倔强的小草

首页　| 　博文目录　| 　关于我

lksoulman

博客访问： 165166
博文数量： 34
博客积分： 0
博客等级：民兵
技术积分： 378
用户组：普通用户
注册时间： 2017-01-17 11:19

个人简介

人的一生犹如负重致远，不可急躁。以不自由为常事，则不觉不足。心生欲望时，应回顾贫困之日。心怀宽恕，视怒如敌，则能无视长久。只知胜而不知敗，必害其身。责人不如责己，不及胜于过之。

文章分类

全部博文（34）

开源代码（1）

开源协议（1）

taobao_tb-common（0）

taobao_tair（0）

github（0）

google_mock（0）

facebook_folly（0）
编程语言（12）

java NIO（7）

java（5）

python（0）

c++（0）

c（0）
操作系统（2）

IPC（1）

windows（0）

linux（1）
消息队列（0）

rabbitmq（0）
代码版本（1）

版本标识（1）
分布式（2）

redis（2）
数据结构（5）

tree（1）

哈希算法（3）

排序算法（1）
世界著名大学（0）
问题库（5）

分布式（1）

概念（1）

算法类（3）
大数据（2）

hadoop（2）
编辑工具（2）

vim（2）
内存管理（0）

slab（0）

ptmalloc（0）

tcmalloc（0）

jemalloc（0）
数据库（2）

redis（0）

mysql（2）
未分配的博文（0）

文章存档

2018年（2）

2017年（32）

我的朋友

1 前言

Hadoop 安装参考下列面链接博文

http://blog.chinaunix.net/uid-31429544-id-5759400.html

2 运行hadoop提供的例子

2.1 启动hadoop

$start-all.sh
启动过程如下图：

注意：$jps 命令可以看到那些进程已经启动，保证 NameNode、SecondaryNameNode、DataNode 、JobTracker、TaskTracker 都正常启动。

2.2 准备数据

    创建一个本地目录input
    在input创建em1.txt、em2.txt、em3.txt、em4.txt四个文件
    如下图：

2.3 文件复制到hadoop中

$hadoop dfs
可以看到hadoop支持的shell命令

$hadoop dfs –mkdir input

在hadoop创建目录 input

$hadoop dfs –ls input

浏览input下的文件

$hadoop dfs –put input/* input

把input目录下的文件从Linux中复制到hadoop中

过程如下图：

2.4 执行wordcount

在hadoop的安装目录下面有hadoop-examples-1.2.1.jar，这个jar包中包含了一些在hadoop中执行的例子，hadoop支持执行jar包中的类。执行hadoop-examples-1.2.1.jar中的wordcount类的命令如下：

    $hadoop jar hadoop-examples-1.2.1.jar wordcount input output
    wordcount表示jar包中的类名，表示要执行这个类
    input是输入文件夹
    output是输出文件夹，必须不存在，它由程序自动创建，如果预先存在output文件夹，则会报错。
    执行过程如下图：

我们可以查看output文件夹的内容来检查程序是否成功创建文件夹，通过查看output文件里面的part-r-00000文件的内容来检查程序执行结果
执行结果如下图：

3 wordcount 源码

在hadoop的安装目录下src/examples/org/apache/hadoop/examples中有很多hadoop提供的可以在hadoop上执行的类，可以找到WordCount.java，源码如下：

点击(此处)折叠或打开

/**
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount ");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

阅读(2153) | 评论(0) | 转发(0) |

上一篇：Hadoop初识和安装

下一篇：mysql安装和卸载

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6