一、环境准备阶段:
假设你需要配置集群的设备有5台:分别为master,slave1,slave2,slave3,slave4;
1、每台机器都创建一个账户hadoop;
2、修改每台机器的主机名:/etc/sysconfig/network
如master的机器:
NETWORKING=yes
HOSTNAME=master(这个名字可以随便起,方便记忆)
slave1:
NETWORKING=yes
HOSTNAME=slave1
slave2:
.....以此类推;
修改完文件后,最后记得在相应的机器上执行hostname master(你修改后的名字) ,hostname slave1等;
3、修改每台机器的/etc/hosts,保证每台机器间都可以通过机器名解析,注意master和slave每台机器都要修改,保证所有机器的hosts文件内容一样;
如:
192.168.30.60 master
192.168.30.61 slave1
192.168.30.62 slave2
192.168.30.63 slave3
192.168.30.65 slave4
4、实现无密码登陆ssh
由于hadoop需要通过ssh服务在各个节点之间登陆并运行服务,因此必须确保安装hadoop的各个节点之间的网络畅通;
确保机器上安装了ssh
(1) 用hadoop用户登陆master机器:
(2)执行:ssh-keygen -t rsa 一路回车(记得不要输入任何字符),将在/home/hadoop/.ssh下生成密钥id_rsa和公钥id_rsa.pub
id_rsa.pub的可能内容:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA3XYLxqxNfltkbKuCpJJDTuQekVJ0L3XA6dLoLQpPLbZxJNQ7DsogcMYM9opg+R1baTMvm1Cbj/cfIwELHPSRLFjN7E6x9S7PWnS2tObXosBNZ/eo6+eZiAF0h0LL+1Rsfsne2cP3amhdztbudSzm1ezLRPBLNUh0FKwDjbgnK2ZZy49h6vCvOZRKJPQf+B3xTSTbix/omalecCdYc1bCFvifOy1pgWVchKSQsynN0V901dA7CAfIjsAKc4DfyGcdoFNFp+POz6+q4AiYUmO+QTh7wPRa2vTg6FRlaaqvTUfnep6prFSVPe/Jh6dt6yyH0k7sIPDIl/kca6cZX0YgNw== hadoop@master
(3)把公钥id_rsa.pub内容拷贝到authorized_keys
cat /home/hadoop/.ssh/id_rsa.pub >>/home/hadoop/.ssh/authorized_keys
(4)把authorized_keys复制到其他的slave机器上:scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.30.61:/home/hadoop/.ssh/、scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.30.62:/home/hadoop/.ssh/ ......等,先确定slave机器上都有.ssh目录,如果没有手动创建一个;
(5)设置目录权限(所有机器)
chmod 750 hadoop
chmod 750 .ssh
chmod 644
authorized_keys
(6)验证ssh是否成功
在master机器上执行ssh slave1
如果不需要输入密码即可
5、安装JDK
这里和普通的安装JDK步骤一样;
首先下载最近的JDK,安装程序,修改环境变量等等;
二、安装hadoop
1、获取cdh3 yum 源并安装Hadoop
(1)wget -c
(2)yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm //安装后将得到 cloudera-cdh3.repo 文件
(3)rpm --import //
导入 rpm key
(4)yum install hadoop-0.20
(5)yum install hadoop-0.20-namenode (安装到要作为namenode的机器,在
/etc/hadoop/conf/core-site.xml中配置,后面会讲到)
yum install hadoop-0.20-datanode (安装到所有的slave机器,也可以安装到namenode机器,把namenode也作为一台datanode)
yum install hadoop-0.20-jobtracker (安装到作为jobtracker机器,jobtrancker机器配置是在/etc/hadoop/conf/hdfs-site.xml 里面配置)
yum install hadoop-0.20-tasktracker
不同的角色安装不同服务;安装datanode的机器需要安装tasktracker,namenode机器也可以用来作为datanode
2、修改配置文档 (hdfs 方面)
//slaves 配置文件 namenode 上配置即可
cat /etc/hadoop/conf/slaves
192.168.30.61
192.168.30.62
192.168.30.63
192.168.30.64
cat /etc/hadoop/conf/masters
192.168.30.60
3、修改/etc/hadoop/conf/hdfs-site.xml 配置文件
dfs.replication
1
dfs.permissions
false
dfs.safemode.extension
0
dfs.safemode.min.datanodes
1
dfs.data.dir
/data/dfs/data
hadoop.tmp.dir
/data/dfs/tmp
dfs.datanode.max.xcievers
200000
4、修改/etc/hadoop/conf/core-site.xml 配置文件
fs.default.name
hdfs://namenode:8020
5、修改/etc/hadoop/conf/mapred-site.xml
mapred.job.tracker
192.168.30.61:9001
mapred.child.java.opts
-Xmx1024m -XX:+UseConcMarkSweepGC
mapred.tasktracker.map.tasks.maximum
1
mapred.tasktracker.reduce.tasks.maximum
1
mapred.local.dir
/data1/hdfs/
The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
mapreduce.jobtracker.staging.root.dir
/user
mapred.system.dir
/mapred/system
io.sort.mb
256
The total amount of buffer memory to use while sorting
files, in megabytes. By default, gives each merge stream 1MB, which
should minimize seeks.
io.sort.factor
64
mapred.max.map.failures.percent
10
mapred.job.reuse.jvm.num.tasks
1
jvm reuse tasks count. default is 1. If it is -1, there is no limit
mapred.reduce.parallel.copies
64
6、启动hadoop 相应进程
root@namenode ~]# /etc/init.d/hadoop-0.20-namenode start (1台namenode)
[root@slave1 /]# /etc/init.d/hadoop-0.20-datanode start (4台datanode)
[root@slave2 /]# /etc/init.d/hadoop-0.20-datanode start
[root@slave1 /]# /etc/init.d/hadoop-0.20-tasktracker start (4台tasktracker 跟datanode相应)
[root@slave1 /]# /etc/init.d/hadoop-0.20-jobtracker start (1台jobtracker)
在相应的机器上启动相应的服务;
7、OK安装完毕
(namenode)
(jobtracker)
阅读(9913) | 评论(1) | 转发(2) |