Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1161955
  • 博文数量: 220
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 1769
  • 用 户 组: 普通用户
  • 注册时间: 2015-03-13 16:19
个人简介

努力, 努力, 再努力

文章分类

全部博文(220)

文章存档

2018年(8)

2017年(46)

2016年(75)

2015年(92)

我的朋友

分类: HADOOP

2015-03-29 16:13:02

 

hadoop2.6.0 集群环境搭建

参考: http://blog.csdn.net/stark_summer/article/details/42424279

一、结构和环境说明


1. 虚拟机配置: 4核CPU, 512M内存, 虚拟硬盘20G

2、linux版本:

[root@slave4 ~]# cat /etc/issue

CentOS release 6.5 (Final)

3. hadoop版本: hadoop-2.6.0.tar

4. jdk版本: jdk-7u9-linux-x64

二、准备工作

(1) 关闭iptables: chkconfig iptables off

禁用selinux (vim /etc/selinux/config: SELINUX=disabled, 修改后重启)

(3) master主机免密码SSH登录slave各主机

   ssh-keygen 一路回车

   ssh-copy-id -i .ssh/id_rsa.pub 192.168.1.60 (其他主机以此类推)

(2)添加haddoop , 用户, 并设置密码

groupadd hadoop; useradd -g hadoop hadoop; echo 123456 |passwd hadoop --stdin

(3) master主机免密码SSH登录slave各主机

   ssh-keygen 一路回车

   ssh-copy-id -i .ssh/id_rsa.pub 192.168.1.60 (其他主机以此类推)

   以上操作在root用户下执行, 为了复制文件方便,不须输入密码

   切换至haddoop, 再作一遍ssh名密码登录

   [root@master ~]# su - hadoop

   [hadoop@master ~]$ ssh-keygen 一路回国

   [hadoop@master ~]$  ssh-copy-id -i .ssh/id_rsa.pub master

   [hadoop@master ~]$  ssh-copy-id -i .ssh/id_rsa.pub slave1

   [hadoop@master ~]$  ssh-copy-id -i .ssh/id_rsa.pub slave2

   [hadoop@master ~]$  ssh-copy-id -i .ssh/id_rsa.pub slave3

   [hadoop@master ~]$  ssh-copy-id -i .ssh/id_rsa.pub slave4

   (4) 配置主机名称,包括namenode 和datanode, 否则hadoop集群无法识别节点

       vim /etc/sysconfig/network 

    HOSTNAME=master (对于其他节点做相应修改: HOSTNAME=slave1)

  (5) 配置主机名解析

     vim /etc/hosts

     192.168.1.70    master

      192.168.1.80    slave1

      192.168.1.90    slave2

      192.168.1.50    slave3

      192.168.1.60    slave4

    修改之后scp到各个主机

  (6) hadoopjava软件安装在/opt目录

  (7) vim /etc/profile 添加以下3

    export JAVA_HOME=/opt/java/jdk

    export HADOOP_HOME=/opt/hadoop

    export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH

    如果修改 /etc/profile, 则执行jps会提示

    -bash: jps: command not found

    修改之后scp到各个主机

  (8) 各主机时间同步: 将master, 各slave主机时间改为一致

     示例:date -s "2015-03-29 12:04:00"  或者通过ntp协议作时间同步

三、安装Hadoop

1. 解压jdk-7u9-linux-x64.gz

  (1) mkdir -pv /opt/java  建立java的目录

  (2) tar -zxvf jdk-7u9-linux-x64.gz -C /opt/java

  (3) mv /opt/java/jdk1.7.0_09/ /opt/java/jdk

  (4) source /etc/profile     加载java变量

  (5) env |grep -E "JAVA_HOME|jdk"  确认已加载java变量和路径

  (6) java -version  (输出如下: java version "1.7.0_09")

2、解压 tar -xzvf hadoop-2.6.0.tar.gz  -C /opt

   修改目录名称: mv /opt/hadoop-2.6.0/ /opt/hadoop

   [root@master hadoop]# ls

bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share

3. 创建hadoop 子目录:

  mkdir -pv /opt/hadoop/{dfs,{dfs/name,dfs/data},logs,tmp}

mkdir: created directory `/opt/hadoop/dfs'

mkdir: created directory `/opt/hadoop/dfs/name'

mkdir: created directory `/opt/hadoop/dfs/data'

mkdir: created directory `/opt/hadoop/logs'

mkdir: created directory `/opt/hadoop/tmp'


主要涉及的配置文件有7个:都在/hadoop/etc/hadoop文件夹下,可以用vim命令对其进行编辑。

/opt/hadoop/etc/hadoop/hadoop-env.sh
/opt /hadoop/etc/hadoop/yarn-env.sh
/opt /hadoop/etc/hadoop/slaves
/opt /hadoop/etc/hadoop/core-site.xml
/opt /hadoop/etc/hadoop/hdfs-site.xml
/opt /hadoop/etc/hadoop/mapred-site.xml
/opt /hadoop/etc/hadoop/yarn-site.xml

5. 编辑hadoop配置文件:

 (1) 配置 hadoop-env.sh文件-->修改JAVA_HOME

  vim /opt/hadoop/etc/hadoop/hadoop-env.sh

  #export JAVA_HOME=${JAVA_HOME}   注释相对变量

  export JAVA_HOME=/opt/java/jdk       使用绝对变量, 原因尚不清楚

 (2) 配置 yarn-env.sh 文件-->>修改JAVA_HOME

  vim /opt/hadoop/etc/hadoop/yarn-env.sh

  # export JAVA_HOME=/home/y/libexec/jdk1.6.0/  在这一行之下加

  export JAVA_HOME=/opt/java/jdk  使用绝对变量, 原因尚不清楚

 (3) 配置slaves文件

  vim /opt/hadoop/etc/hadoop/slaves

   slave1

   slave1

 slave3

 slave4

 (4) 配置core-site.xml

  vim /opt/hadoop/etc/hadoop/core-site.xml

 

   

        fs.defaultFS

        hdfs://master:9000

   

   

        io.file.buffer.size

        131072

   

   

        hadoop.tmp.dir

        file:/opt/hadoop/tmp

        Abasefor other temporary directories.

   

   

        hadoop.proxyuser.spark.hosts

        *

   

   

        hadoop.proxyuser.spark.groups

        *

   

 (5) 配置  hdfs-site.xml 文件置

   

        dfs.namenode.secondary.http-address

        master:9001

   

   

        dfs.namenode.name.dir

        file:/opt/hadoop/dfs/name

   

   

        dfs.datanode.data.dir

        file:/opt/hadoop/dfs/data

   

   

        dfs.replication

        3

   

   

        dfs.webhdfs.enabled

        true

   

(6) 、配置  mapred-site.xml 文件

   

        mapreduce.framework.name

        yarn

   

   

        mapreduce.jobhistory.address

        master:10020

   

   

        mapreduce.jobhistory.webapp.address

        master:19888

   


 

(7)配置   yarn-site.xml  文件-->>增加yarn功能

        

             yarn.nodemanager.aux-services

             mapreduce_shuffle

        

        

              yarn.nodemanager.aux-services.mapreduce.shuffle.class

              org.apache.hadoop.mapred.ShuffleHandler

        

        

             yarn.resourcemanager.address

             master:8032

        

        

             yarn.resourcemanager.scheduler.address

              master:8030

        

        

             yarn.resourcemanager.resource-tracker.address

             master:8035

        

        

             yarn.resourcemanager.admin.address

             master:8033

        

        

             yarn.resourcemanager.webapp.address

             master:8088

        


6. 修改/opt/下的java, hadoop目录的属主属组为hadoop

   [root@master ~]# chown -R hadoop.hadoop /opt/*

7. 将配置好的目录scp到slave1 至slave4, 并修改java,hadoop的属主,属组为hadoop

scp -r /opt/{java,hadoop} 192.168.1.60:/opt;ssh 192.168.1.60 "chown -R hadoop.hadoop /opt/{java,hadoop}"

scp -r /opt/{java,hadoop} 192.168.1.70:/opt;ssh 192.168.1.70 "chown -R hadoop.hadoop /opt/{java,hadoop}"

scp -r /opt/{java,hadoop} 192.168.1.80:/opt;ssh 192.168.1.80 "chown -R hadoop.hadoop /opt/{java,hadoop}"

scp -r /opt/{java,hadoop} 192.168.1.90:/opt;ssh 192.168.1.90 "chown -R hadoop.hadoop /opt/{java,hadoop}"


四、格式化hdfs, 启动hadoop

1. 切换为hadoop用户操作

   [root@master ~]# su - hadoop

2. 格式化namenode:
[hadoop@master ~]$ /opt/hadoop/bin/hdfs namenode -format

看到 "15/03/29 12:07:41 INFO util.ExitUtil: Exiting with status 0" 表示正常

3. 启动hdfs:

[hadoop@master ~]$ /opt/hadoop/sbin/start-dfs.sh

15/03/29 12:21:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [master]

master: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-master.out

slave1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave1.out

slave4: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave4.out

slave2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave2.out

slave3: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave3.out

Starting secondary namenodes [master]

master: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out

15/03/29 12:22:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

4. 查看java进程状态 (jps - Java Virtual Machine Process Status Tool)

[hadoop@master ~]$ jps

4081 Jps

3389 NameNode

3545 SecondaryNameNode


5. 启动yarn

[hadoop@master ~]$ /opt/hadoop/sbin/start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-master.out

slave3: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave3.out

slave2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave2.out

slave4: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave4.out

slave1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave1.out

[hadoop@master ~]$ jps

5247 Jps

4511 ResourceManager

4880 NameNode

5062 SecondaryNameNode

各slave1-slave4 各进程状态如下:

[hadoop@slave3 ~]$ jps

2898 NodeManager

2797 DataNode

3040 Jps


6、停止yarn:

[hadoop@master ~]$ /opt/hadoop/sbin/stop-yarn.sh

stopping yarn daemons

stopping resourcemanager

slave1: stopping nodemanager

slave3: stopping nodemanager

slave4: stopping nodemanager

slave2: stopping nodemanager

no proxyserver to stop


[hadoop@master ~]$ jps

5455 Jps

4880 NameNode

5062 SecondaryNameNode


7. 停止hdfs:

[hadoop@master ~]$ /opt/hadoop/sbin/stop-dfs.sh

15/03/29 12:38:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Stopping namenodes on [master]

master: stopping namenode

slave2: stopping datanode

slave4: stopping datanode

slave3: stopping datanode

slave1: stopping datanode

Stopping secondary namenodes [master]

master: stopping secondarynamenode

15/03/29 12:39:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[hadoop@master ~]$ jps

5759 Jps


8. 查看集群状态:重新重启dfs, yarn

[hadoop@master ~]$ /opt/hadoop/bin/hdfs dfsadmin -report

15/03/29 12:54:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Configured Capacity: 75677286400 (70.48 GB)

Present Capacity: 55484186624 (51.67 GB)

DFS Remaining: 55484088320 (51.67 GB)

DFS Used: 98304 (96 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0


-------------------------------------------------

Live datanodes (4):


Name: 192.168.1.70:50010 (slave2)

Hostname: slave2

Decommission Status : Normal

Configured Capacity: 18919321600 (17.62 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 5047750656 (4.70 GB)

DFS Remaining: 13871546368 (12.92 GB)

DFS Used%: 0.00%

DFS Remaining%: 73.32%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sun Mar 29 12:54:11 CST 2015


Name: 192.168.1.80:50010 (slave3)

Hostname: slave3

Decommission Status : Normal

Configured Capacity: 18919321600 (17.62 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 5047767040 (4.70 GB)

DFS Remaining: 13871529984 (12.92 GB)

DFS Used%: 0.00%

DFS Remaining%: 73.32%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sun Mar 29 12:54:11 CST 2015


Name: 192.168.1.60:50010 (slave1)

Hostname: slave1

Decommission Status : Normal

Configured Capacity: 18919321600 (17.62 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 5047799808 (4.70 GB)

DFS Remaining: 13871497216 (12.92 GB)

DFS Used%: 0.00%

DFS Remaining%: 73.32%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sun Mar 29 12:54:11 CST 2015


Name: 192.168.1.90:50010 (slave4)

Hostname: slave4

Decommission Status : Normal

Configured Capacity: 18919321600 (17.62 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 5049782272 (4.70 GB)

DFS Remaining: 13869514752 (12.92 GB)

DFS Used%: 0.00%

DFS Remaining%: 73.31%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sun Mar 29 12:54:09 CST 2015


9. 查看hdfs:



10查看RM:


五、运行wordcount程序

9.1、创建 input目录:

[hadoop@master ~]$ mkdir /opt/hadoop/input

9.2、在input之下创建f1、f2并写内容

[hadoop@master ~]$ echo "Hello world  bye jj" >/opt/hadoop/input/f1

[hadoop@master ~]$ echo "Hello Hadoop  bye Hadoop" >/opt/hadoop/input/f2


9.3、在hdfs创建/tmp/input目录

[hadoop@master ~]$ /opt/hadoop/bin/hadoop fs -mkdir /tmp

15/03/29 13:20:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


[hadoop@master ~]$ /opt/hadoop/bin/hadoop fs -mkdir /tmp/input

15/03/29 13:25:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


9.4、将f1、f2文件copy到hdfs /tmp/input目录

[hadoop@master hadoop]$ hadoop fs -put input/ /tmp

15/03/29 13:30:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


9.5、查看hdfs上是否有f1、f2文件

[hadoop@master hadoop]$  ./bin/hadoop fs -ls /tmp/input/

15/03/29 13:35:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r--   3 hadoop supergroup         20 2015-03-29 13:30 /tmp/input/f1

-rw-r--r--   3 hadoop supergroup         25 2015-03-29 13:30 /tmp/input/f2


9.6、执行wordcount程序

[spark@S1PA11 hadoop-2.6.0]$ [hadoop@master hadoop]$  ./bin/hadoop fs -ls /tmp/input/

15/03/29 13:35:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r--   3 hadoop supergroup         20 2015-03-29 13:30 /tmp/input/f1

-rw-r--r--   3 hadoop supergroup         25 2015-03-29 13:30 /tmp/input/f2

[hadoop@master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /tmp/input /output

15/03/29 13:37:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/03/29 13:37:03 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.50:8032

15/03/29 13:37:08 INFO input.FileInputFormat: Total input paths to process : 2

15/03/29 13:37:08 INFO mapreduce.JobSubmitter: number of splits:2

15/03/29 13:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427604785944_0001

15/03/29 13:37:12 INFO impl.YarnClientImpl: Submitted application application_1427604785944_0001

15/03/29 13:37:15 INFO mapreduce.Job: The url to track the job:

15/03/29 13:37:15 INFO mapreduce.Job: Running job: job_1427604785944_0001

15/03/29 13:37:40 INFO mapreduce.Job: Job job_1427604785944_0001 running in uber mode : false

15/03/29 13:37:40 INFO mapreduce.Job:  map 0% reduce 0%

15/03/29 13:39:29 INFO mapreduce.Job:  map 100% reduce 0%

15/03/29 13:40:08 INFO mapreduce.Job:  map 100% reduce 100%

15/03/29 13:40:17 INFO mapreduce.Job: Job job_1427604785944_0001 completed successfully

15/03/29 13:40:17 INFO mapreduce.Job: Counters: 49

       File System Counters

              FILE: Number of bytes read=84

              FILE: Number of bytes written=317647

              FILE: Number of read operations=0

              FILE: Number of large read operations=0

              FILE: Number of write operations=0

              HDFS: Number of bytes read=237

              HDFS: Number of bytes written=36

              HDFS: Number of read operations=9

              HDFS: Number of large read operations=0

              HDFS: Number of write operations=2

       Job Counters

              Launched map tasks=2

              Launched reduce tasks=1

              Data-local map tasks=2

              Total time spent by all maps in occupied slots (ms)=232524

              Total time spent by all reduces in occupied slots (ms)=24928

              Total time spent by all map tasks (ms)=232524

              Total time spent by all reduce tasks (ms)=24928

              Total vcore-seconds taken by all map tasks=232524

              Total vcore-seconds taken by all reduce tasks=24928

              Total megabyte-seconds taken by all map tasks=238104576

              Total megabyte-seconds taken by all reduce tasks=25526272

       Map-Reduce Framework

              Map input records=2

              Map output records=8

              Map output bytes=75

              Map output materialized bytes=90

              Input split bytes=192

              Combine input records=8

              Combine output records=7

              Reduce input groups=5

              Reduce shuffle bytes=90

              Reduce input records=7

              Reduce output records=5

              Spilled Records=14

              Shuffled Maps =2

              Failed Shuffles=0

              Merged Map outputs=2

              GC time elapsed (ms)=5794

              CPU time spent (ms)=16700

              Physical memory (bytes) snapshot=390656000

              Virtual memory (bytes) snapshot=2511228928

              Total committed heap usage (bytes)=241037312

       Shuffle Errors

              BAD_ID=0

              CONNECTION=0

              IO_ERROR=0

              WRONG_LENGTH=0

              WRONG_MAP=0

              WRONG_REDUCE=0

       File Input Format Counters

              Bytes Read=45

       File Output Format Counters

              Bytes Written=36

9.7、查看执行结果

[hadoop@master ~]$ hadoop fs -ls /

15/03/29 13:52:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2015-03-29 13:40 /output

drwxr-xr-x   - hadoop supergroup          0 2015-03-29 13:37 /tmp


[hadoop@master ~]$ hadoop fs -ls /output

15/03/29 13:54:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r--   3 hadoop supergroup          0 2015-03-29 13:40 /output/_SUCCESS

-rw-r--r--   3 hadoop supergroup         36 2015-03-29 13:40 /output/part-r-00000


[hadoop@master ~]$ hadoop fs -cat /output/part-r-00000

15/03/29 13:58:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Hadoop  2

Hello      2

bye 2

jj      1

world      1

9.8 删除输出目录, 重新运行wordcount程序

[hadoop@master ~]$ hadoop fs -rm -r /output

15/03/29 13:59:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/03/29 13:59:30 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /output


阅读(2020) | 评论(0) | 转发(0) |
0

上一篇:没有了

下一篇:JBOSS 7.1.1的安装,部署,连接数据库

给主人留下些什么吧!~~