昨天安装hadoop-2.2.0时碰到了点问题,幸好今早过来就解决了,赶紧记一下配置过程,免得以后麻烦。
环境:
NameNode:
CPU: 24核, Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
Memory: 48G
OS: Ubuntu 12.04 (LDE)
HostName(IP): cre-bj(10.240.192.51, 192.168.42.1)
uname: Linux cre-bj 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
DataNode:
CPU: 32核, Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
Memory: 64G
OS: Ubuntu 12.04.2 LTS
HostName(IP)
192.168.42.101 cre-s1
192.168.42.102 cre-s2
192.168.42.103 cre-s3
uname: Linux cre-s1 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
环境准备:
1. 安装JDK
apt-get install openjdk-7-jdk
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
2. 在所有节点上创建hadoop账户,创建各自的ssh key,然后要把namenode的key保存到data
node的authorized_keys里,把各个data node的key保存到name
node的authorized_keys里,这样可以保证name node和data node可以相互访问
scp cre-xxx:.ssh/id_isa.pub .
cat id_isa.pub >> authorized_keys
3. 创建用于hdfs的目录,在每个节点上创建目录
/home/hadoop/store/2.2.0/dfs/name
/home/hadoop/store/2.2.0/dfs/data
/home/hadoop/store/2.2.0/tmp
安装hadoop-2.2.0
1. 下载hadoop2.2版本。地址:。
2. 解压到/home/hadoop/hadoop-2.2.0
3. 修改~/hadoop-2.2.0/etc/hadoop下的配置文件
----- slaves -----
cre-s1
cre-s2
----- core-site.xml ------
fs.defaultFS
hdfs://cre-bj:9000
------ hdfs-site.xml -----
dfs.replication
3
dfs.namenode.name.dir
/home/hadoop/store/2.2.0/dfs/name
dfs.datanode.data.dir
/home/hadoop/store/2.2.0/dfs/data
dfs.blocksize
268435456
----- yarn-site.xml -----
yarn.resourcemanager.resource-tracker.address
cre-bj:8031
yarn.resourcemanager.scheduler.address
cre-bj:8030
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.address
cre-bj:8032
yarn.resourcemanager.admin.address
cre-bj:8033
yarn.resourcemanager.webapp.address
cre-bj:8088
yarn.scheduler.minimum-allocation-mb
1024
yarn.scheduler.maximum-allocation-mb
2048
yarn.nodemanager.resource.memory-mb
9000
yarn.nodemanager.resource.cpu-cores
8
List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
yarn.nodemanager.local-dirs
/dev/shm/nm-local-dir
然后将目录/home/hadoop/hadoop-2.2.0复制到cre-s1 cre-s2 cre-s3上的相同目录下。
4. 在NameNode(cre-bj)上启动hadoop,相应的datanode也会被启动
cd ~/hadoop-2.2.0/sbin
./start-all.sh (或./start-hdfs.sh)
hadoop
21435 1 2 18:47 ? 00:00:04
/usr/lib/jvm/java-7-openjdk-amd64/bin/java ...
org.apache.hadoop.hdfs.server.namenode.NameNode
hadoop 21738 1
2 18:47 ? 00:00:03 /usr/lib/jvm/java-7-openjdk-amd64/bin/java
... org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
hadoop
21928 1 2 18:47 pts/2 00:00:04
/usr/lib/jvm/java-7-openjdk-amd64/bin/java ...
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
在data node上:
hadoop
40157 1 7 10:52 ? 00:00:03
/usr/lib/jvm/java-7-openjdk-amd64/bin/java ...
org.apache.hadoop.hdfs.server.datanode.DataNode
hadoop 40363 1
12 10:53 ? 00:00:04 /usr/lib/jvm/java-7-openjdk-amd64/bin/java
... org.apache.hadoop.yarn.server.nodemanager.NodeManager
5. 检查启动状态,从浏览器中访问如果表格中的数据没问题,说明启动成功。
Configured Capacity
|
:
|
2.38 TB
|
DFS Used
|
:
|
72.24 KB
|
Non DFS Used
|
:
|
318.13 GB
|
DFS Remaining
|
:
|
2.07 TB
|
DFS Used%
|
:
|
0.00%
|
DFS Remaining%
|
:
|
86.97%
|
Block Pool Used
|
:
|
72.24 KB
|
Block Pool Used%
|
:
|
0.00%
|
DataNodes usages
|
:
|
Min %
|
Median %
|
Max %
|
stdev %
|
|
|
0.00%
|
0.00%
|
0.00%
|
0.00%
|
|
:
|
3 (Decommissioned: 0)
|
|
:
|
0 (Decommissioned: 0)
|
|
:
|
0
|
Number of Under-Replicated Blocks
|
:
|
0
|
6. 测试拷贝文件
cd ~/hadoop-2.2.0/bin
./hadoop dfs -mkdir /harry
./hadoop dfs -copyFromLocal kmeans_data.txt /harry
./hadoop dfs -ls /
./hadoop dfs -ls /harry
7. 检查log文件
在~/hadoop-2.2.0/logs下有启动的log,如果启动不成功,可以在log中看看有没有exception或者unexpected信息。
8. 通过java来访问hdfs上的文件
public class Test {
public static void main(String[] args) throws Exception {
try {
readFromHdfs();
} catch (Exception e) {
e.printStackTrace();
}
finally
{
System.out.println("SUCCESS");
}
}
private static void readFromHdfs() throws FileNotFoundException,IOException {
public static String HDFS_URL="hdfs://cre-bj:9000/";
String dst = HDFS_URL + "harry/kmeans_data.txt";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
FSDataInputStream hdfsInStream = fs.open(new Path(dst));
OutputStream out = new FileOutputStream("./kmeans_data.txt");
byte[] ioBuffer = new byte[1024];
int readLen = hdfsInStream.read(ioBuffer);
while(-1 != readLen){
out.write(ioBuffer, 0, readLen);
readLen = hdfsInStream.read(ioBuffer);
}
out.close();
hdfsInStream.close();
fs.close();
}
}
hadoop@cre-bj:~$ java -classpath ./hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:./hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-logging-1.1.1.jar:./hadoop-2.2.0/share/hadoop/common/lib/guava-11.0.2.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-configuration-1.6.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-lang-2.5.jar:./hadoop-2.2.0/share/hadoop/common/lib/hadoop-auth-2.2.0.jar:./hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-cli-1.2.jar:./hadoop-2.2.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:. Test
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See for further details.
Feb 11, 2014 11:12:36 PM org.apache.hadoop.util.NativeCodeLoader
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
SUCCESS
需要注意的几点:
1. 各节点上防火墙要关闭
2. 通过浏览器访问时hdfs时,浏览器需要能访问到data node才行。比如namenode如果有两个ip,其余data node只有内部ip,在外部访问name node时就无法加载data node的信息。
3. 如果data node无法登录name node,启动hdfs并不会报错,是实际上会有问题
4. 配置文件中如果使用host name做配置,host name应该都配置到内网ip。
阅读(3921) | 评论(0) | 转发(0) |