Chinaunix首页 | 论坛 | 博客
  • 博客访问: 508510
  • 博文数量: 35
  • 博客积分: 3472
  • 博客等级: 中校
  • 技术积分: 935
  • 用 户 组: 普通用户
  • 注册时间: 2007-08-04 06:54
文章分类
文章存档

2014年(4)

2013年(2)

2011年(3)

2010年(9)

2009年(9)

2008年(8)

分类: 服务器与存储

2014-02-12 11:07:05


昨天安装hadoop-2.2.0时碰到了点问题,幸好今早过来就解决了,赶紧记一下配置过程,免得以后麻烦。

环境:
NameNode:
     CPU: 24核, Intel(R) Xeon(R) CPU           X5680  @ 3.33GHz
     Memory: 48G
     OS: Ubuntu 12.04 (LDE)
     HostName(IP): cre-bj(10.240.192.51, 192.168.42.1)
     uname: Linux cre-bj 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
     
DataNode:
     CPU: 32核, Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
     Memory: 64G
     OS: Ubuntu 12.04.2 LTS
     HostName(IP)
            192.168.42.101  cre-s1
            192.168.42.102  cre-s2
            192.168.42.103  cre-s3
     uname: Linux cre-s1 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

环境准备:
1. 安装JDK
apt-get install openjdk-7-jdk

java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

2. 在所有节点上创建hadoop账户,创建各自的ssh key,然后要把namenode的key保存到data node的authorized_keys里,把各个data node的key保存到name node的authorized_keys里,这样可以保证name node和data node可以相互访问
scp cre-xxx:.ssh/id_isa.pub .
cat id_isa.pub >> authorized_keys


3. 创建用于hdfs的目录,在每个节点上创建目录
/home/hadoop/store/2.2.0/dfs/name
/home/hadoop/store/2.2.0/dfs/data
/home/hadoop/store/2.2.0/tmp

安装hadoop-2.2.0
1. 下载hadoop2.2版本。地址:。
2. 解压到/home/hadoop/hadoop-2.2.0
3. 修改~/hadoop-2.2.0/etc/hadoop下的配置文件
----- slaves  -----
cre-s1
cre-s2
cre-s3

----- core-site.xml ------

 
  fs.defaultFS
  hdfs://cre-bj:9000
 



------ hdfs-site.xml -----

 
   dfs.replication
   3
 

 
   dfs.namenode.name.dir
   /home/hadoop/store/2.2.0/dfs/name
 

 
   dfs.datanode.data.dir
   /home/hadoop/store/2.2.0/dfs/data
 

 
   dfs.blocksize
   268435456
 



----- yarn-site.xml -----
 
  yarn.resourcemanager.resource-tracker.address
  cre-bj:8031
 

 
  yarn.resourcemanager.scheduler.address
  cre-bj:8030
 

 
  yarn.resourcemanager.scheduler.class
  org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 

 
 yarn.nodemanager.aux-services
 mapreduce_shuffle
 

 
  yarn.resourcemanager.address
  cre-bj:8032
 

 
  yarn.resourcemanager.admin.address
  cre-bj:8033
 

 
  yarn.resourcemanager.webapp.address
  cre-bj:8088
 

 
  yarn.scheduler.minimum-allocation-mb
  1024
 

 
  yarn.scheduler.maximum-allocation-mb
  2048
 

 
  yarn.nodemanager.resource.memory-mb
  9000
 

 
  yarn.nodemanager.resource.cpu-cores
  8
 
 
 List of directories to store localized files in. An
          application's localized file directory will be found in:
          ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
          Individual containers' work directories, called container_${contid}, will
          be subdirectories of this.
 

 yarn.nodemanager.local-dirs
 /dev/shm/nm-local-dir
 


然后将目录/home/hadoop/hadoop-2.2.0复制到cre-s1 cre-s2 cre-s3上的相同目录下。

4. 在NameNode(cre-bj)上启动hadoop,相应的datanode也会被启动
cd ~/hadoop-2.2.0/sbin
./start-all.sh (或./start-hdfs.sh)


启动后查看进程:
在name node上:
hadoop   21435     1  2 18:47 ?        00:00:04 /usr/lib/jvm/java-7-openjdk-amd64/bin/java ... org.apache.hadoop.hdfs.server.namenode.NameNode
hadoop   21738     1  2 18:47 ?        00:00:03 /usr/lib/jvm/java-7-openjdk-amd64/bin/java ... org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
hadoop   21928     1  2 18:47 pts/2    00:00:04 /usr/lib/jvm/java-7-openjdk-amd64/bin/java ... org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

在data node上:
hadoop    40157      1  7 10:52 ?        00:00:03 /usr/lib/jvm/java-7-openjdk-amd64/bin/java ... org.apache.hadoop.hdfs.server.datanode.DataNode
hadoop    40363      1 12 10:53 ?        00:00:04 /usr/lib/jvm/java-7-openjdk-amd64/bin/java ... org.apache.hadoop.yarn.server.nodemanager.NodeManager

5. 检查启动状态,从浏览器中访问如果表格中的数据没问题,说明启动成功。
Configured Capacity : 2.38 TB
DFS Used : 72.24 KB
Non DFS Used : 318.13 GB
DFS Remaining : 2.07 TB
DFS Used% : 0.00%
DFS Remaining% : 86.97%
Block Pool Used : 72.24 KB
Block Pool Used% : 0.00%
DataNodes usages : Min % Median % Max % stdev %


0.00% 0.00% 0.00% 0.00%
: 3 (Decommissioned: 0)
: 0 (Decommissioned: 0)
: 0
Number of Under-Replicated Blocks : 0

6. 测试拷贝文件
cd ~/hadoop-2.2.0/bin
./hadoop dfs -mkdir /harry
./hadoop dfs -copyFromLocal kmeans_data.txt /harry
./hadoop dfs -ls /
./hadoop dfs -ls /harry

7. 检查log文件
在~/hadoop-2.2.0/logs下有启动的log,如果启动不成功,可以在log中看看有没有exception或者unexpected信息。

8. 通过java来访问hdfs上的文件

public class Test {

 public static void main(String[] args) throws Exception {
  try {
   readFromHdfs();
  } catch (Exception e) {
   e.printStackTrace();
  }
  finally
  {
   System.out.println("SUCCESS");
  }
}
 private static void readFromHdfs() throws FileNotFoundException,IOException {
 public static String HDFS_URL="hdfs://cre-bj:9000/";
  String dst = HDFS_URL + "harry/kmeans_data.txt";

  Configuration conf = new Configuration();
  FileSystem fs = FileSystem.get(URI.create(dst), conf);
  FSDataInputStream hdfsInStream = fs.open(new Path(dst));

  OutputStream out = new FileOutputStream("./kmeans_data.txt");
  byte[] ioBuffer = new byte[1024];
  int readLen = hdfsInStream.read(ioBuffer);

  while(-1 != readLen){
  out.write(ioBuffer, 0, readLen);
  readLen = hdfsInStream.read(ioBuffer);
  }
  out.close();
  hdfsInStream.close();
  fs.close();
 }
}

hadoop@cre-bj:~$ java -classpath ./hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:./hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-logging-1.1.1.jar:./hadoop-2.2.0/share/hadoop/common/lib/guava-11.0.2.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-configuration-1.6.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-lang-2.5.jar:./hadoop-2.2.0/share/hadoop/common/lib/hadoop-auth-2.2.0.jar:./hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:./hadoop-2.2.0/share/hadoop/common/lib/commons-cli-1.2.jar:./hadoop-2.2.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:. Test
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See for further details.
Feb 11, 2014 11:12:36 PM org.apache.hadoop.util.NativeCodeLoader
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
SUCCESS


需要注意的几点:
1. 各节点上防火墙要关闭
2. 通过浏览器访问时hdfs时,浏览器需要能访问到data node才行。比如namenode如果有两个ip,其余data node只有内部ip,在外部访问name node时就无法加载data node的信息。
3. 如果data node无法登录name node,启动hdfs并不会报错,是实际上会有问题
4. 配置文件中如果使用host name做配置,host name应该都配置到内网ip。


阅读(3921) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~