为满足公司程序运行环境,hadoop集群由原来的1.0版本升级到CDH5版本,又一次集群安装经历,分享给有需要的人。
一、机器准备
Linux版本CentOs 5.8,x86_64,如果你的linux版本是6.x,也可以参照下面步骤安装;
本人此次安装共准备了5台机器:
192.168.32.70(master),
192.168.32.71(slave1),192.168.32.72(slave2),192.168.32.73(slave3),192.168.32.79(slave4);
修改/etc/sysconfig/network文件中的HOSTNAME,修改为方便记忆的名字,当然你也可以不改,只要你觉得方便就好;
修改/etc/hosts文件(五台机器都要修改):
192.168.32.70 master
192.168.32.71 slave1
192.168.32.72 slave2
192.168.32.73 slave3
192.168.32.79 slave4
二、环境准备
1、打通ssh
>所有机器 ssh-keygen -t rsa 一路按回车;
>在master机器上执行:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys;
>scp文件到其他几台机器:
scp ~/.ssh/authorized_keys root@slave1:~/.ssh/
scp ~/.ssh/authorized_keys root@slave2:~/.ssh/
......
scp ~/.ssh/authorized_keys root@slave4:~/.ssh/
>试验下免密码功能是否正常:
-
[root@master hadoop-conf]# ssh slave1
-
Last login: Wed Sep 24 16:07:12 2014 from master
-
[root@slave1 ~]#
没有提示输入密码,表示成功了;
2、安装JDK7
>官网下载jdk-7u51-linux-x64.rpm包;
>rpm -ivh jdk-7u51-linux-x64.rpm
>添加环境变量;
vi /etc/profile
增加
JAVA_HOME=/usr/java/latest
PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export JAVA_HOME CLASSPATH
>执行source生效;
source /etc/profile
3、创建hadoop用户
groupadd hdfs
useradd hadoop -g hdfs
三、安装cdh5
1、下载rpm安装包
>进入目录/data/tools/ (个人习惯的软件存储目录,你可以自己随便选择);
wget "
5/x86_64/cloudera-cdh-5-0.x86_64.rpm" ---------如果你的Linux版本是6.x这里改为6即可,下同;
yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
>添加cloudera仓库验证;
rpm --import
5/x86_64/cdh/RPM-GPG-KEY-cloudera
2、安装
>
master 安装NN,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-hdfs-namenode
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>
slave1 安装RM,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-resourcemanager
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>slave2 、slave3、slave4安装NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
3、创建目录 (本人机器只有一个盘cache1,如果你有多个可以创建多个)
DN:
mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
chown -R mapred:hadoop /data/cache1/dfs/mapred/local
NN:
mkdir -p /data/cache1/dfs/nn
chown -R hdfs:hadoop /data/cache1/dfs/nn
chmod 700 /data/cache1/dfs/nn
4、修改配置文件
修改master机器上的配置文件,然后scp到各个slave;
1)/etc/hadoop/conf/core-site.xml 红色IP为NN地址;
-
[root@master conf]# cat core-site.xml
-
-
-
-
-
-
-
fs.defaultFS
-
hdfs://192.168.32.70:8020
-
-
-
-
dfs.replication
-
1
-
-
2)/etc/hadoop/conf/hdfs-site.xml/yarn-site.xml
-
[root@master conf]# cat /etc/hadoop/conf/hdfs-site.xml
-
-
-
-
-
-
-
dfs.name.dir
-
/var/lib/hadoop-hdfs/cache/hdfs/dfs/name
-
-
-
dfs.datanode.data.dir
-
/data/cache1/dfs/dn/
-
-
3)/etc/hadoop/conf 红色的IP为装RM的机器,本例子中是192.168.32.71;
-
[root@master conf]# cat yarn-site.xml
-
-
-
-
-
-
-
yarn.nodemanager.aux-services
-
mapreduce_shuffle
-
-
-
-
yarn.nodemanager.aux-services.mapreduce_shuffle.class
-
org.apache.hadoop.mapred.ShuffleHandler
-
-
-
-
yarn.log-aggregation-enable
-
true
-
-
-
-
List of directories to store localized files in.
-
yarn.nodemanager.local-dirs
-
/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir
-
-
-
yarn.resourcemanager.address
-
192.168.32.71:8032
-
-
-
-
yarn.resourcemanager.scheduler.address
-
192.168.32.71:8030
-
-
-
-
yarn.resourcemanager.webapp.address
-
0.0.0.0:8088
-
-
-
-
yarn.resourcemanager.resource-tracker.address
-
192.168.32.71:8031
-
-
-
-
yarn.resourcemanager.admin.address
-
192.168.32.71:8033
-
-
-
-
-
-
-
Where to store container logs.
-
yarn.nodemanager.log-dirs
-
/var/log/hadoop-yarn/containers
-
-
-
Where to aggregate logs to.
-
yarn.nodemanager.remote-app-log-dir
-
/var/log/hadoop-yarn/apps
-
-
-
-
Classpath for typical applications.
-
yarn.application.classpath
-
-
$HADOOP_CONF_DIR,
-
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
-
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
-
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
-
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
-
-
-
4)/etc/hadoop/conf/hadoop-env.sh
-
[root@master conf]# cat hadoop-env.sh
-
# Set Hadoop-specific environment variables here.
-
-
# The only required environment variable is JAVA_HOME. All others are
-
# optional. When running a distributed configuration it is best to
-
# set JAVA_HOME in this file, so that it is correctly defined on
-
# remote nodes.
-
-
# The maximum amount of heap to use, in MB. Default is 1000.
-
#export HADOOP_HEAPSIZE=
-
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""
-
-
# Extra Java runtime options. Empty by default.
-
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}"
-
-
# Command specific options appended to HADOOP_OPTS when specified
-
export HADOOP_NAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}"
-
HADOOP_JOBTRACKER_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dmapred.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
-
HADOOP_TASKTRACKER_OPTS="-Dsecurity.audit.logger=ERROR,console -Dmapred.audit.logger=ERROR,console ${HADOOP_TASKTRACKER_OPTS}"
-
HADOOP_DATANODE_OPTS="-Dsecurity.audit.logger=ERROR,DRFAS ${HADOOP_DATANODE_OPTS}"
-
-
export HADOOP_SECONDARYNAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_SECONDARYNAMENODE_OPTS}"
-
-
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
-
export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"
-
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData ${HADOOP_JAVA_PLATFORM_OPTS}"
-
-
# On secure datanodes, user to run the datanode as after dropping privileges
-
export HADOOP_SECURE_DN_USER=hdfs
-
-
# Where log files are stored. $HADOOP_HOME/logs by default.
-
export HADOOP_LOG_DIR=/var/local/hadoop/logs
-
-
# Where log files are stored in the secure data environment.
-
export HADOOP_SECURE_DN_LOG_DIR=$HADOOP_LOG_DIR
-
-
# The directory where pid files are stored. /tmp by default.
-
export HADOOP_PID_DIR=/var/local/hadoop/pid
-
export HADOOP_SECURE_DN_PID_DIR=$HADOOP_PID_DIR
-
-
# A string representing this instance of hadoop. $USER by default.
-
export HADOOP_IDENT_STRING=$USER
-
export JAVA_HOME=/usr/java/latest
5)修改/etc/hadoop/conf/slave文件;
添加slave:
slave1
slave2
slave3
slave4
6)scp文件到各个slave;
scp /etc/hadoop/conf root@slave1:/etc/hadoop/conf
四、启动
1)NN(master)启动
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
2)DN(slave1)启动(装有RM)
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
/etc/init.d/hadoop-yarn-resourcemanager
3)DN(slave2/slave3/slave4)启动
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
五、查看
(类似于hadoop1.0的Jobtracker地址,即50030端口)
六、安装中出现的问题以及解决办法
启动NN时报:log4j:ERROR Could not find value for key log4j.appender.DRFAAUDIT错误;
解决办法:在/etc/hadoop/conf/log4j.properties 加入以下配置
log4j.appender.DRFAAUDIT=org.apache.log4j.ConsoleAppender
log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout
七、总结
hadoop 1.0版本和当前装的cdh5版本,从安装方面差别还是挺大的,不过还好不算麻烦,一步一步来,遇到问题多问就OK;
阅读(19326) | 评论(2) | 转发(1) |