我们将部署一个五结点的hadoop集群,集群架构如下图:
1.其中nameNode主节点和DataNode从节点的分布情况如下:
nameNode主节点
|
DataNode从节点
|
192.168.10.161
|
192.168.10.162
|
无secondNameNode
|
192.168.10.163
|
|
192.168.10.164
|
|
192.168.10.165
|
|
|
2.开始搭建集群环境
以下均在ubuntu14.04下进行。
1).准备集群环境
配置ssh无需密码连接登录机器
使用ssh-keygen命令生成公钥文件,如下:
注意输入命令后,一路回车,不要输入什么东西。
最后就爱那个公钥文件id_rsa.pub分别拷贝到所有slave机器的相应目录下,并且重命名为authorized_keys
如下:
bob@bob-virtual-machine:~$ scp .ssh/id_rsa.pub bob@192.168.10.162:/home/bob/.ssh/authorized_keys
bob@bob-virtual-machine:~$ scp .ssh/id_rsa.pub bob@192.168.10.163:/home/bob/.ssh/authorized_keys
bob@bob-virtual-machine:~$ scp .ssh/id_rsa.pub bob@192.168.10.164:/home/bob/.ssh/authorized_keys
bob@bob-virtual-machine:~$ scp .ssh/id_rsa.pub bob@192.168.10.165:/home/bob/.ssh/authorized_keys
2):安装必须的软件
根据hadoop安装需求,hadoop 的nameNode节点是通过Java程序和各个DataNode节点来通讯的,所以我们除了安装hadoop的二进制版本外,还必须安装相应的jdk来保证hadoop的运行环境的稳定。
bob@bob-virtual-machine:~$ tar xzvf jdk-8u73-linux-x64.tar.gz
bob@bob-virtual-machine:~$ tar xzvf hadoop-2.6.4-src.tar.gz
接下来配置环境变量
export JAVA_HOME=/home/bob/jdk1.8.0_73,设置jdk主目录
export PATH=${JAVA_HOME}/bin/:${PATH}
3)配置集群
a.将java添加到hadoop运行环境
在nameNode主节点上编辑配置文件conf/hadoop-env.sh,如下:
....
# The java implementation to use.
export JAVA_HOME=/home/bob/jdk1.8.0_73
....
b.配置nameNode主节点信息
在nameNode主节点上编辑conf/core-site.xml,如下:
fs.default.name
hdfs://192.168.10.161:9000
bob@bob-virtual-machine:~/hadoop-2.6.4/etc/hadoop$
c.配置数据冗余数量
在nameNode主节点上配置hdfs-site.xml,如下:
dfs.replication
2
bob@bob-virtual-machine:~/hadoop-2.6.4/etc/hadoop$
d.配置jobtracker
在nameNode主节点上配置mapred-site.xml,如下:
mapred.job.tracker
192.168.10.161:9001
bob@bob-virtual-machine:~/hadoop-2.6.4/etc/hadoop$
e.配置master
在nameNode主节点上编辑etc/hadoop/masters文件,如下:
bob@bob-virtual-machine:~/hadoop-2.6.4/etc/hadoop$ cat masters
192.168.10.161
bob@bob-virtual-machine:~/hadoop-2.6.4/etc/hadoop$
f.配置slave
在nameNode主节点上编辑etc/hadoop/slave文件,如下:
bob@bob-virtual-machine:~/hadoop-2.6.4/etc/hadoop$ cat slaves
192.168.10.162
192.168.10.163
192.168.10.164
192.168.10.165
bob@bob-virtual-machine:~/hadoop-2.6.4/etc/hadoop$
g.发布hadoop配置信息到dataNode节点去
bob@bob-virtual-machine:~$ scp -r jdk1.8.0_73 hadoop-2.6.4 bob@192.168.10.162:/home/bob/
bob@bob-virtual-machine:~$ scp -r jdk1.8.0_73 hadoop-2.6.4 bob@192.168.10.163:/home/bob/
bob@bob-virtual-machine:~$ scp -r jdk1.8.0_73 hadoop-2.6.4 bob@192.168.10.164:/home/bob/
bob@bob-virtual-machine:~$ scp -r jdk1.8.0_73 hadoop-2.6.4 bob@192.168.10.165:/home/bob/
h.格式化分布式文件系统
如下:
bob@bob-virtual-machine:~/hadoop-2.6.4$ ls
aaa bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share
bob@bob-virtual-machine:~/hadoop-2.6.4$ ./bin/hadoop namenode -format
i.格式化完成后,启动集群
bob@bob-virtual-machine:~/hadoop-2.6.4$ ./sbin/start-dfs.sh
即可。
输出如下:
bob@bob-virtual-machine:~/hadoop-2.6.4$ ./sbin/start-dfs.sh
Starting namenodes on [bob-virtual-machine.lan]
bob-virtual-machine.lan: starting namenode, logging to /home/bob/hadoop-2.6.4/logs/hadoop-bob-namenode-bob-virtual-machine.out
192.168.10.162: starting datanode, logging to /home/bob/hadoop-2.6.4/logs/hadoop-bob-datanode-bob-virtual-machine.out
192.168.10.163: starting datanode, logging to /home/bob/hadoop-2.6.4/logs/hadoop-bob-datanode-bob-virtual-machine.out
192.168.10.164: starting datanode, logging to /home/bob/hadoop-2.6.4/logs/hadoop-bob-datanode-bob-virtual-machine.out
192.168.10.165: starting datanode, logging to /home/bob/hadoop-2.6.4/logs/hadoop-bob-datanode-bob-virtual-machine.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/bob/hadoop-2.6.4/logs/hadoop-bob-secondarynamenode-bob-virtual-machine.out
bob@bob-virtual-machine:~/hadoop-2.6.4$
可以分别在主控节点上和从节点上使用jps命令查看相应进程信息。
测试
创建文件
先创建目录
bob@bob-virtual-machine:~/hadoop-2.6.4$ ./bin/hadoop fs -mkdir /user/bob
然后创建文件
bob@bob-virtual-machine:~/hadoop-2.6.4$ ./bin/hadoop fs -put xxxfilenamexxx /user
然后查询文件:
bob@bob-virtual-machine:~/hadoop-2.6.4$ ./bin/hadoop fs -ls /user
阅读(2142) | 评论(0) | 转发(0) |