分类: 系统运维
2015-03-09 15:49:00
hadoop中的NameNode好比是人的心脏,非常重要,绝对不可以停止工作。在hadoop1时代,只有一个NameNode。如果该NameNode数据丢失或者不能工作,那么整个集群就不能恢复了。这是hadoop1中的单点问题,也是hadoop1不可靠的表现,如图1所示。hadoop2就解决了这个问题。
hadoop2.2.0中HDFS的高可靠指的是可以同时启动2个NameNode。其中一个处于工作状态,另一个处于随时待命状态。这样,当一个NameNode所在的服务器宕机时,可以在数据不丢失的情况下,手工或者自动切换到另一个NameNode提供服务。
这些NameNode之间通过共享数据,保证数据的状态一致。多个NameNode之间共享数据,可以通过Nnetwork File System或者Quorum Journal Node。前者是通过linux共享的文件系统,属于操作系统的配置;后者是hadoop自身的东西,属于软件的配置。
我们这里讲述使用Quorum Journal Node的配置方式,方式是手工切换。
集群启动时,可以同时启动2个NameNode。这些NameNode只有一个是active的,另一个属于standby状态。active状态意味着提供服务,standby状态意味着处于休眠状态,只进行数据同步,时刻准备着提供服务。
环境:
操作系统:CentOS 6.2 x64
hadoop版本:hadoop-2.5.2.tar.gz
zookeeper 版本:zookeeper-3.4.6.tar.gz
添加hosts记录:
10.0.2.54 hnn01
10.0.2.55 hnn02
10.0.2.62 yarnzk
10.0.2.63 hdn01zk
10.0.2.64 hdn02zk
角色分布:
IP地址 | 主机名 | 角色 |
10.0.2.54 | hnn01 | Namenode01 + journal |
10.0.2.55 | hnn02 | Namenode02 + journal |
10.0.2.62 | yarnzk | Yarn + zookpeer + journal |
10.0.2.63 | hdn01zk | Datanode + zookeeper |
10.0.2.64 | hdn02zk | Datanode + zookeeper |
Slave机器创建ssh目录 : mkdir -m 700 /root/.ssh
将公钥复制到Slave机器上:scp authorized_keys hnn02:/root/.ssh/
注:由于完全模拟生产环境,故把角色尽量分开,所以需要做3次互信(hnn01,hnn02,yarnzk)机器都需要
测试无密码登录是否正常 !!
2.安装zookeeper
[root@yarnzk src]# tar zxf zookeeper-3.4.6.tar.gz
[root@yarnzk src]# mv zookeeper-3.4.6 ../zookeeper
[root@yarnzk local]# cd /usr/local/zookeeper/conf/
vi zoo.cfg
dataDir=/usr/local/zookeeper/zk_data/
server.0=yarnzk:2888:3888
server.1=hdn01zk:2888:3888
server.2=hdn02zk:2888:3888
创建文件夹mkdir /usr/local/zookpeer/zk_data
[root@yarnzk conf]# echo '0' > /usr/local/zookeeper/zk_data/myid
把zk目录复制到hdn01zk和hdn02zk中,并修改myid文件值
启动,在三个节点上分别执行命令zkServer.sh start
检验,在三个节点上分别执行命令zkServer.sh status
3.安装hadoop集群
[root@hnn01 src]# tar zxf hadoop-2.5.2.tar.gz
[root@hnn01 src]# mv hadoop-2.5.2 ../hadoop
[root@hnn01 src]# cd ../hadoop/etc/hadoop/
[root@hnn01 src]# mkdir /usr/local/hadoop/journal
[root@hnn01 hadoop]# vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0_51
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/"
export HADOOP_COMMON_LIB_NATIVE_DIR="/usr/local/hadoop/lib/native/"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/local/hadoop/etc/hadoop"}
[root@hnn01 hadoop]# vim mapred-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0_51
[root@hnn01 hadoop]# vim yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0_51
[root@hnn01 hadoop]# more core-site.xml
[root@hnn01 hadoop]# more hdfs-site.xml
[root@hnn01 hadoop]# more mapred-site.xml
[root@hnn01 hadoop]# more yarn-site.xml
[root@hnn01 hadoop]# more slaves
hdn01zk
hdn02zk
将修改好的hadoop目录拷贝到各个节点:
[root@hnn01 local]# scp -r hadoop yarnzk:/usr/local/
为datanode节点创建数据目录:
[root@hnn01 sbin]# ./slaves.sh mkdir /data{0,1,2}/dfs/
拷贝系统/环境变量到各个服务器:
[root@hnn01 ~]# scp /etc/hosts /etc/profile hnn02:/etc/
[root@hnn01 ~]# hdfs zkfc -formatZK
在每个JournalNode上执行(否则启动服务时会报错):
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
第一次启动格式化HDFS
hdfs namenode -format
在每个JournalNode上执行(否则启动服务时会报错):
/usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
启动hdfs服务:
[root@hnn01 sbin]# ./start-dfs.sh
Starting namenodes on [hnn01 hnn02]
hnn01: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hnn01.out
hnn02: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hnn02.out
hdn01zk: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hdn01zk.out
hdn02zk: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hdn02zk.out
Starting journal nodes [hnn01 hnn02 yarnzk]
hnn02: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-hnn02.out
yarnzk: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-yarnzk.out
hnn01: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-hnn01.out
Starting ZK Failover Controllers on NN hosts [hnn01 hnn02]
hnn02: starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-hnn02.out
hnn01: starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-hnn01.out
[root@hnn01 sbin]# jps
20422 Jps
17813 JournalNode
18174 DFSZKFailoverController
17595 NameNode
访问 会看到该节点已经成为active
下面需要同步一次元数据:
hnn02 上执行:
hdfs namenode -bootstrapStandby
15/03/09 15:38:37 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/03/09 15:38:37 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
15/03/09 15:38:37 WARN common.Util: Path /usr/local/hadoop/hadoop-tmp-root/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
15/03/09 15:38:37 WARN common.Util: Path /data/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
15/03/09 15:38:37 WARN common.Util: Path /usr/local/hadoop/hadoop-tmp-root/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
15/03/09 15:38:37 WARN common.Util: Path /data/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
=====================================================
About to bootstrap Standby ID nn1 from:
Nameservice ID: myhdfs
Other Namenode ID: nn2
Other NN's HTTP address:
Other NN's IPC address: hnn02/10.0.2.55:54310
Namespace ID: 108854670
Block pool ID: BP-1908663152-10.0.2.54-1425550312574
Cluster ID: CID-687b459e-7b9e-4162-9c25-3061545b1ebb
Layout version: -57
=====================================================
Re-format filesystem in Storage Directory /usr/local/hadoop/hadoop-tmp-root/dfs/name ? (Y or N) Y
Re-format filesystem in Storage Directory /data/dfs/name ? (Y or N) Y
15/03/09 15:39:28 INFO common.Storage: Storage directory /usr/local/hadoop/hadoop-tmp-root/dfs/name has been successfully formatted.
15/03/09 15:39:28 INFO common.Storage: Storage directory /data/dfs/name has been successfully formatted.
访问 会看到该节点已经成为standby
然后kill掉hnn01上的active NN进程,standby NN会成为active
注意:手动切换时,会提示下面警告。所以一般在启动zkfc的情况下也无需进行切换。
hdfs haadmin -transitionToActive nn1
启动yarn服务:
[root@yarnzk sbin]# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-yarnzk.out
hdn02zk: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hdn02zk.out
hdn01zk: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hdn01zk.out
[root@yarnzk sbin]# jps
6994 QuorumPeerMain
13657 ResourceManager
13537 JournalNode
14174 Jps
访问