hadoop2.3.0单点伪分布与多点分布的配置-LaoLiulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4669336
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

相关博文

hadoop2.3.0单点伪分布与多点分布的配置

分类： HADOOP

2014-01-07 00:49:35

我的机器是mac book，安装virtualbox4.3.6，virtualbox安装ubunt13.10，在多点分布环境中，配置好一个机器后，clone出另外2个，一共三台机器。

1. Configure the Environment

			Bash语言: 
		
			sudo ./debian_set_java.sh jdk-7u51-linux-x64.tar.gz

sudo apt-get install -y openssh-server

sudo addgroup hadoop

sudo adduser —ingroup hadoop hadoop # create password

sudo visudo

hadoop  ALL=(ALL) ALL # hadoop user can use sudo

su - hadoop # need password

ssh-keygen -t rsa -P "" # Enter file (/home/hadoop/.ssh/id_rsa)

cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys

wget 

tar zxvf hadoop-2.3.0.tar.gz

sudo cp -pr hadoop-2.3.0/ /opt

cd /opt

sudo ln -s hadoop-2.3.0 hadoop

sudo chown -R hadoop:hadoop hadoop-2.3.0

sed -i 's/${JAVA_HOME}/\/usr\/local\/java\/jdk1.7.0_51/' hadoop/etc/hadoop/hadoop-env.sh

debian_set_java.sh

	#!/bin/bash

if [ `whoami` != "root" ]; then

 echo "Use root to set java environment."

 exit 1

fi

if [ `uname` == 'Darwin' ]; then

 echo "Darwin system."

 exit 1

fi

function unpack_java()

{

 mkdir /usr/local/java

 tar -pzxf $1 -C /usr/local/java/

 if [ `echo $?` != 0 ]; then

   echo "untar file failed."

   exit 1

 fi  

 echo "unpack $1 done"

}

function set_j8_env()

{

 cp -p /etc/profile /tmp/profile.bak_`date "+%Y%m%d%H%M%S"`

 sed -i '$a \\nJAVA_HOME=/usr/local/java/jdk1.8.0\nPATH=$PATH:$JAVA_HOME/bin\nexport JAVA_HOME\nexport PATH'/etc/profile

 source /etc/profile

 update-alternatives --install /usr/bin/javac javac /usr/local/java/jdk1.8.0/bin/javac  95  

 update-alternatives --install /usr/bin/java  java  /usr/local/java/jdk1.8.0/bin/java  95  

 update-alternatives --config java

 echo "set java 8 environment done"

}

function set_j7_env()

{

 update-alternatives --install /usr/bin/javac javac /usr/local/java/jdk1.7.0_51/bin/javac  95  

 update-alternatives --install /usr/bin/java  java  /usr/local/java/jdk1.7.0_51/bin/java  95  

 update-alternatives --config java

 echo "set java 7 environment done"

}

unpack_java $1

set_j7_env

2. Configure hadoop single Node environment
cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml

mapreduce.cluster.temp.dir

No description
true

mapreduce.cluster.local.dir

No description
true

vi yarn-site.xml

yarn.resourcemanager.resource-tracker.address
127.0.0.1:8021
host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.

yarn.resourcemanager.scheduler.address
127.0.0.1:8022
host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager.

yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
In case you do not want to use the default scheduler

yarn.resourcemanager.address
127.0.0.1:8023
the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager.

yarn.nodemanager.local-dirs

the local directories used by the nodemanager

yarn.nodemanager.address
0.0.0.0:8041
the nodemanagers bind to this port

yarn.nodemanager.resource.memory-mb
10240
the amount of memory on the NodeManager in GB

yarn.nodemanager.remote-app-log-dir
/app-logs
directory on hdfs where the application logs are moved to

yarn.nodemanager.log-dirs

the directories used by Nodemanagers as log directories

yarn.nodemanager.aux-services
mapreduce_shuffle
shuffle service that needs to be set for Map Reduce to run

补充配置：
mapred-site.xml

mapreduce.framework.name
yarn

core-site.xml

fs.defaultFS
hdfs://127.0.0.1:9000

hdfs-site.xml

dfs.replication
1

	Bash语言: 

	cd /opt/hadoop

bin/hdfs namenode -format

sbin/hadoop-daemon.sh start namenode

sbin/hadoop-daemon.sh start datanode

sbin/yarn-daemon.sh start resourcemanager

sbin/yarn-daemon.sh start nodemanager

jps

# Run a job on this node

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 5 10

3. Running Problem
14/01/04 05:38:22 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8023. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

netstat -atnp # found tcp6
Solve:
cat /proc/sys/net/ipv6/conf/all/disable_ipv6 # 0 means ipv6 is on, 1 means off
cat /proc/sys/net/ipv6/conf/lo/disable_ipv6
cat /proc/sys/net/ipv6/conf/default/disable_ipv6
ip a | grep inet6 # have means ipv6 is on

sudo vi /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1

sudo sysctl -p # have the same effect with reboot

4. Cluster setup

Config /opt/hadoop/etc/hadoop/{hadoop-env.sh, yarn-env.sh}

export JAVA_HOME=/usr/local/java/jdk1.7.0_51

cd /opt/hadoop

mkdir opt/hadoop/tmp

mkdir -p opt/hadoop/{data,name} # on every node. name on namenode, data on datanode

vi /etc/hosts # hostname also changed on each node

192.168.1.110 cloud1

192.168.1.112 cloud2

192.168.1.114 cloud3

vi /opt/hadoop/etc/hadoop/slaves

cloud2

cloud3

core-site.xml

fs.defaultFS
hdfs://cloud1:9000

io.file.buffer.size
131072

hadoop.tmp.dir
/opt/hadoop/tmp
A base for other temporary directories.

据说dfs.datanode.data.dir 需要清空，不然datanode不能启动
hdfs-site.xml

dfs.namenode.name.dir
/opt/hadoop/name

dfs.datanode.data.dir
/opt/hadoop/data

dfs.replication
2

yarn-site.xml

yarn.resourcemanager.address
cloud1:8032
ResourceManager host:port for clients to submit jobs.

yarn.resourcemanager.scheduler.address
cloud1:8030
ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.

yarn.resourcemanager.resource-tracker.address
cloud1:8031
ResourceManager host:port for NodeManagers.

yarn.resourcemanager.admin.address
cloud1:8033
ResourceManager host:port for administrative commands.

yarn.resourcemanager.webapp.address
cloud1:8088
ResourceManager web-ui host:port.

yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
In case you do not want to use the default scheduler

yarn.nodemanager.resource.memory-mb
10240
the amount of memory on the NodeManager in MB

yarn.nodemanager.local-dirs

the local directories used by the nodemanager

yarn.nodemanager.log-dirs

the directories used by Nodemanagers as log directories

yarn.nodemanager.remote-app-log-dir
/app-logs
directory on hdfs where the application logs are moved to

yarn.nodemanager.aux-services
mapreduce_shuffle
shuffle service that needs to be set for Map Reduce to run

mapred-site.xml

mapreduce.framework.name
yarn

mapreduce.jobhistory.address
cloud1:10020

mapreduce.jobhistory.webapp.address
cloud1:19888

cd /opt/hadoop/
bin/hdfs namenode -format
sbin/start-dfs.sh # cloud1 NameNode SecondaryNameNode, cloud2 and cloud3 DataNode
sbin/start-yarn.sh # cloud1 ResourceManager, cloud2 and cloud3 NodeManager
jps

查看集群状态 bin/hdfs dfsadmin -report
查看文件块组成 bin/hdfs fsck / -files -blocks
NameNode查看hdfs
查看RM
bin/hdfs dfs -mkdir /input
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar randomwriter input

5. Questions:
14/01/05 23:59:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/opt/hadoop/lib/native/ 下面的动态链接库是32bit的，要替换成64位的

ssh 登录出现Are you sure you want to continue connecting (yes/no)?解决方法
修改/etc/ssh/ssh_config 将其中的# StrictHostKeyChecking ask 改成 StrictHostKeyChecking no

两个slaves的DataNode无法加入cluster系统，把/etc/hosts 里面127.0.1.1或localhost 的内容行删除

阅读(2388) | 评论(4) | 转发(0) |

上一篇：关于LevelDB

下一篇：mac下命令

给主人留下些什么吧！~~

laoliulaoliu2014-03-13 23:14:08

scq2099yt：对于hadoop2.2.0的配置和部署介绍得很详细易懂，非常实用，赞一个，顺便建议图文并茂更能让人耳目一新。