努力, 努力, 再努力
全部博文(220)
分类: HADOOP
2015-03-29 16:13:02
参考: http://blog.csdn.net/stark_summer/article/details/42424279
一、结构和环境说明
1. 虚拟机配置: 4核CPU, 512M内存, 虚拟硬盘20G
2、linux版本:
[root@slave4 ~]# cat /etc/issue
CentOS release 6.5 (Final)
3. hadoop版本: hadoop-2.6.0.tar
4. jdk版本: jdk-7u9-linux-x64
二、准备工作
(1) 关闭iptables: chkconfig iptables off
禁用selinux (vim /etc/selinux/config: SELINUX=disabled, 修改后重启)
(3) 在master主机免密码SSH登录slave各主机
ssh-keygen 一路回车
ssh-copy-id -i .ssh/id_rsa.pub 192.168.1.60 (其他主机以此类推)
(2)添加haddoop 组, 用户, 并设置密码
groupadd hadoop; useradd -g hadoop hadoop; echo 123456 |passwd hadoop --stdin
(3) 在master主机免密码SSH登录slave各主机
ssh-keygen 一路回车
ssh-copy-id -i .ssh/id_rsa.pub 192.168.1.60 (其他主机以此类推)
以上操作在root用户下执行, 为了复制文件方便,不须输入密码
切换至haddoop, 再作一遍ssh名密码登录
[root@master ~]# su - hadoop
[hadoop@master ~]$ ssh-keygen 一路回国
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub master
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub slave1
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub slave2
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub slave3
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub slave4
(4) 配置主机名称,包括namenode 和datanode, 否则hadoop集群无法识别节点
vim /etc/sysconfig/network
HOSTNAME=master (对于其他节点做相应修改: HOSTNAME=slave1)
(5) 配置主机名解析
vim /etc/hosts
192.168.1.70 master
192.168.1.80 slave1
192.168.1.90 slave2
192.168.1.50 slave3
192.168.1.60 slave4
修改之后scp到各个主机
(6) hadoop和java软件安装在/opt目录
(7) vim /etc/profile 添加以下3行
export JAVA_HOME=/opt/java/jdk
export HADOOP_HOME=/opt/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
如果修改 /etc/profile, 则执行jps会提示
-bash: jps: command not found
修改之后scp到各个主机
(8) 各主机时间同步: 将master, 各slave主机时间改为一致
示例:date -s "2015-03-29 12:04:00" 或者通过ntp协议作时间同步
三、安装Hadoop
1. 解压jdk-7u9-linux-x64.gz
(1) mkdir -pv /opt/java 建立java的目录
(2) tar -zxvf jdk-7u9-linux-x64.gz -C /opt/java
(3) mv /opt/java/jdk1.7.0_09/ /opt/java/jdk
(4) source /etc/profile 加载java变量
(5) env |grep -E "JAVA_HOME|jdk" 确认已加载java变量和路径
(6) java -version (输出如下: java version "1.7.0_09")
2、解压 tar -xzvf hadoop-2.6.0.tar.gz -C /opt
修改目录名称: mv /opt/hadoop-2.6.0/ /opt/hadoop
[root@master hadoop]# ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
3. 创建hadoop 子目录:
mkdir -pv /opt/hadoop/{dfs,{dfs/name,dfs/data},logs,tmp}
mkdir: created directory `/opt/hadoop/dfs'
mkdir: created directory `/opt/hadoop/dfs/name'
mkdir: created directory `/opt/hadoop/dfs/data'
mkdir: created directory `/opt/hadoop/logs'
mkdir: created directory `/opt/hadoop/tmp'
主要涉及的配置文件有7个:都在/hadoop/etc/hadoop文件夹下,可以用vim命令对其进行编辑。
/opt/hadoop/etc/hadoop/hadoop-env.sh
/opt /hadoop/etc/hadoop/yarn-env.sh
/opt /hadoop/etc/hadoop/slaves
/opt /hadoop/etc/hadoop/core-site.xml
/opt /hadoop/etc/hadoop/hdfs-site.xml
/opt /hadoop/etc/hadoop/mapred-site.xml
/opt /hadoop/etc/hadoop/yarn-site.xml
5. 编辑hadoop配置文件:
(1) 配置 hadoop-env.sh文件-->修改JAVA_HOME
vim /opt/hadoop/etc/hadoop/hadoop-env.sh
#export JAVA_HOME=${JAVA_HOME} 注释相对变量
export JAVA_HOME=/opt/java/jdk 使用绝对变量, 原因尚不清楚
(2) 配置 yarn-env.sh 文件-->>修改JAVA_HOME
vim /opt/hadoop/etc/hadoop/yarn-env.sh
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/ 在这一行之下加
export JAVA_HOME=/opt/java/jdk 使用绝对变量, 原因尚不清楚
(3) 配置slaves文件
vim /opt/hadoop/etc/hadoop/slaves
slave1
slave1
slave3
slave4
(4) 配置core-site.xml
vim /opt/hadoop/etc/hadoop/core-site.xml
(5) 配置 hdfs-site.xml 文件置
(6) 、配置 mapred-site.xml 文件
(7)配置 yarn-site.xml 文件-->>增加yarn功能
6. 修改/opt/下的java, hadoop目录的属主属组为hadoop
[root@master ~]# chown -R hadoop.hadoop /opt/*
7. 将配置好的目录scp到slave1 至slave4, 并修改java,hadoop的属主,属组为hadoop
scp -r /opt/{java,hadoop} 192.168.1.60:/opt;ssh 192.168.1.60 "chown -R hadoop.hadoop /opt/{java,hadoop}"
scp -r /opt/{java,hadoop} 192.168.1.70:/opt;ssh 192.168.1.70 "chown -R hadoop.hadoop /opt/{java,hadoop}"
scp -r /opt/{java,hadoop} 192.168.1.80:/opt;ssh 192.168.1.80 "chown -R hadoop.hadoop /opt/{java,hadoop}"
scp -r /opt/{java,hadoop} 192.168.1.90:/opt;ssh 192.168.1.90 "chown -R hadoop.hadoop /opt/{java,hadoop}"
四、格式化hdfs, 启动hadoop
1. 切换为hadoop用户操作
[root@master ~]# su - hadoop
2.
格式化namenode:
[hadoop@master ~]$ /opt/hadoop/bin/hdfs namenode -format
看到 "15/03/29 12:07:41 INFO util.ExitUtil: Exiting with status 0" 表示正常
3. 启动hdfs:
[hadoop@master ~]$ /opt/hadoop/sbin/start-dfs.sh
15/03/29 12:21:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-master.out
slave1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave1.out
slave4: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave4.out
slave2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave2.out
slave3: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave3.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
15/03/29 12:22:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
4. 查看java进程状态 (jps - Java Virtual Machine Process Status Tool)
[hadoop@master ~]$ jps
4081 Jps
3389 NameNode
3545 SecondaryNameNode
5. 启动yarn
[hadoop@master ~]$ /opt/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave3: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave3.out
slave2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
slave4: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave4.out
slave1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
[hadoop@master ~]$ jps
5247 Jps
4511 ResourceManager
4880 NameNode
5062 SecondaryNameNode
各slave1-slave4 各进程状态如下:
[hadoop@slave3 ~]$ jps
2898 NodeManager
2797 DataNode
3040 Jps
6、停止yarn:
[hadoop@master ~]$ /opt/hadoop/sbin/stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
slave1: stopping nodemanager
slave3: stopping nodemanager
slave4: stopping nodemanager
slave2: stopping nodemanager
no proxyserver to stop
[hadoop@master ~]$ jps
5455 Jps
4880 NameNode
5062 SecondaryNameNode
7. 停止hdfs:
[hadoop@master ~]$ /opt/hadoop/sbin/stop-dfs.sh
15/03/29 12:38:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [master]
master: stopping namenode
slave2: stopping datanode
slave4: stopping datanode
slave3: stopping datanode
slave1: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
15/03/29 12:39:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@master ~]$ jps
5759 Jps
8. 查看集群状态:重新重启dfs, yarn
[hadoop@master ~]$ /opt/hadoop/bin/hdfs dfsadmin -report
15/03/29 12:54:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 75677286400 (70.48 GB)
Present Capacity: 55484186624 (51.67 GB)
DFS Remaining: 55484088320 (51.67 GB)
DFS Used: 98304 (96 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (4):
Name: 192.168.1.70:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 18919321600 (17.62 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 5047750656 (4.70 GB)
DFS Remaining: 13871546368 (12.92 GB)
DFS Used%: 0.00%
DFS Remaining%: 73.32%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Mar 29 12:54:11 CST 2015
Name: 192.168.1.80:50010 (slave3)
Hostname: slave3
Decommission Status : Normal
Configured Capacity: 18919321600 (17.62 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 5047767040 (4.70 GB)
DFS Remaining: 13871529984 (12.92 GB)
DFS Used%: 0.00%
DFS Remaining%: 73.32%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Mar 29 12:54:11 CST 2015
Name: 192.168.1.60:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 18919321600 (17.62 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 5047799808 (4.70 GB)
DFS Remaining: 13871497216 (12.92 GB)
DFS Used%: 0.00%
DFS Remaining%: 73.32%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Mar 29 12:54:11 CST 2015
Name: 192.168.1.90:50010 (slave4)
Hostname: slave4
Decommission Status : Normal
Configured Capacity: 18919321600 (17.62 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 5049782272 (4.70 GB)
DFS Remaining: 13869514752 (12.92 GB)
DFS Used%: 0.00%
DFS Remaining%: 73.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Mar 29 12:54:09 CST 2015
9. 查看hdfs:
10查看RM:
五、运行wordcount程序
9.1、创建 input目录:
[hadoop@master ~]$ mkdir /opt/hadoop/input
9.2、在input之下创建f1、f2并写内容
[hadoop@master ~]$ echo "Hello world bye jj" >/opt/hadoop/input/f1
[hadoop@master ~]$ echo "Hello Hadoop bye Hadoop" >/opt/hadoop/input/f2
9.3、在hdfs创建/tmp/input目录
[hadoop@master ~]$ /opt/hadoop/bin/hadoop fs -mkdir /tmp
15/03/29 13:20:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@master ~]$ /opt/hadoop/bin/hadoop fs -mkdir /tmp/input
15/03/29 13:25:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
9.4、将f1、f2文件copy到hdfs /tmp/input目录
[hadoop@master hadoop]$ hadoop fs -put input/ /tmp
15/03/29 13:30:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
9.5、查看hdfs上是否有f1、f2文件
[hadoop@master hadoop]$ ./bin/hadoop fs -ls /tmp/input/
15/03/29 13:35:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 3 hadoop supergroup 20 2015-03-29 13:30 /tmp/input/f1
-rw-r--r-- 3 hadoop supergroup 25 2015-03-29 13:30 /tmp/input/f2
9.6、执行wordcount程序
[spark@S1PA11 hadoop-2.6.0]$ [hadoop@master hadoop]$ ./bin/hadoop fs -ls /tmp/input/
15/03/29 13:35:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 3 hadoop supergroup 20 2015-03-29 13:30 /tmp/input/f1
-rw-r--r-- 3 hadoop supergroup 25 2015-03-29 13:30 /tmp/input/f2
[hadoop@master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /tmp/input /output
15/03/29 13:37:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/29 13:37:03 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.50:8032
15/03/29 13:37:08 INFO input.FileInputFormat: Total input paths to process : 2
15/03/29 13:37:08 INFO mapreduce.JobSubmitter: number of splits:2
15/03/29 13:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427604785944_0001
15/03/29 13:37:12 INFO impl.YarnClientImpl: Submitted application application_1427604785944_0001
15/03/29 13:37:15 INFO mapreduce.Job: The url to track the job:
15/03/29 13:37:15 INFO mapreduce.Job: Running job: job_1427604785944_0001
15/03/29 13:37:40 INFO mapreduce.Job: Job job_1427604785944_0001 running in uber mode : false
15/03/29 13:37:40 INFO mapreduce.Job: map 0% reduce 0%
15/03/29 13:39:29 INFO mapreduce.Job: map 100% reduce 0%
15/03/29 13:40:08 INFO mapreduce.Job: map 100% reduce 100%
15/03/29 13:40:17 INFO mapreduce.Job: Job job_1427604785944_0001 completed successfully
15/03/29 13:40:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=84
FILE: Number of bytes written=317647
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=237
HDFS: Number of bytes written=36
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=232524
Total time spent by all reduces in occupied slots (ms)=24928
Total time spent by all map tasks (ms)=232524
Total time spent by all reduce tasks (ms)=24928
Total vcore-seconds taken by all map tasks=232524
Total vcore-seconds taken by all reduce tasks=24928
Total megabyte-seconds taken by all map tasks=238104576
Total megabyte-seconds taken by all reduce tasks=25526272
Map-Reduce Framework
Map input records=2
Map output records=8
Map output bytes=75
Map output materialized bytes=90
Input split bytes=192
Combine input records=8
Combine output records=7
Reduce input groups=5
Reduce shuffle bytes=90
Reduce input records=7
Reduce output records=5
Spilled Records=14
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=5794
CPU time spent (ms)=16700
Physical memory (bytes) snapshot=390656000
Virtual memory (bytes) snapshot=2511228928
Total committed heap usage (bytes)=241037312
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=45
File Output Format Counters
Bytes Written=36
9.7、查看执行结果
[hadoop@master ~]$ hadoop fs -ls /
15/03/29 13:52:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2015-03-29 13:40 /output
drwxr-xr-x - hadoop supergroup 0 2015-03-29 13:37 /tmp
[hadoop@master ~]$ hadoop fs -ls /output
15/03/29 13:54:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2015-03-29 13:40 /output/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 36 2015-03-29 13:40 /output/part-r-00000
[hadoop@master ~]$ hadoop fs -cat /output/part-r-00000
15/03/29 13:58:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hadoop 2
Hello 2
bye 2
jj 1
world 1
9.8 删除输出目录, 重新运行wordcount程序
[hadoop@master ~]$ hadoop fs -rm -r /output
15/03/29 13:59:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/29 13:59:30 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /output