Chinaunix首页 | 论坛 | 博客
  • 博客访问: 834522
  • 博文数量: 167
  • 博客积分: 7173
  • 博客等级: 少将
  • 技术积分: 1671
  • 用 户 组: 普通用户
  • 注册时间: 2009-08-04 23:07
文章分类

全部博文(167)

文章存档

2018年(1)

2017年(11)

2012年(2)

2011年(27)

2010年(88)

2009年(38)

分类: 云计算

2011-08-05 12:34:18

这里省略hadoop的介绍,直接介绍安装步骤,按照这步骤就能克隆搭建一个实例。
角色列表:
namenode & jobtracker 192.168.237.13
datanode & tasktracker 192.168.237.74
datanode & tasktracker 192.168.239.128

#useradd hadoop
download hadoop-0.20.2.tar.gz 
#mkdir /data/hadoop
#tar -zxvf hadoop-0.20.2.tar.gz
#chown -R hadoop:hadoop hadoop-0.20.2 hadoop
解决无密码登录问题
#./ssh_nopasswd.sh client && ./ssh_nopasswd.sh server 按需修改用户和路径
 ssh_nopasswd.zip  
----------------------------
以下四个文件的配置,在一台机上编辑好后,传到其它机器上,面前重复编辑。
相关文件配置:
core-site.xml 配置namenode jobtracker基本信息
主要配置
fs.default.name:URI of NameNode
mapred.job.tracker:jobtracker ip 和 端口
hadoop.tmp.dir:hadoop临时目录
dfs.name.dir:name table存储路径
dfs.data.dir:namenode数据块配置
dfs.replication:副本数

PS:
我的host中进行了如下设置:
192.168.237.13  hadoop-237-13.pconline.ctc      hadoop-237-13
192.168.237.74  hadoop-237-74.pconline.ctc      hadoop-237-74
192.168.239.128  hadoop-239-128.pconline.ctc      hadoop-239-128

例子:

fs.default.name
hdfs://hadoop-237-13:9000
The name of the default file system. Either the literal string "local" or a host:port for DFS.





mapred.job.tracker
192.168.237.13:9001
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and
reduce task.




hadoop.tmp.dir
/data/hadoop/tmp
A base for other temporary directories.



dfs.name.dir
/data/hadoop/filesystem/name
Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.



dfs.data.dir
/data/hadoop/filesystem/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are i
gnored.




dfs.replication
2
Default block replication. The actual number of replications can be specified when the file is created. The default isused if replication is not specified in create time.

mapred-site.xml
配置map reduce 的一些细节信息
看description进行配置就行

mapred.job.tracker
192.168.237.13:9001
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.



mapred.tasktracker.map.tasks.maximum
2
The maximum number of map tasks that will be run simultaneously by a task tracker.




mapred.tasktracker.reduce.tasks.maximum
2
The maximum number of reduce tasks that will be run simultaneously by a task tracker.



mapred.map.tasks
2
The default number of map tasks per job. Ignored when mapred.job.tracker is "local".



mapred.reduce.tasks
2
The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local".



mapred.userlog.retain.hours
2
The maximum time, in hours, for which the user-logs are to be retained.



   mapred.child.java.opts
   -Xmx700M -server



  mapred.map.max.attempts
  800
  Expert: The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.



  mapred.reduce.max.attempts
  800
  Expert: The maximum number of attempts per reduce task. In other words, framework will try to execute a reduce task these many number of times before giving up on it.



  mapred.max.tracker.failures
  800
  The number of task-failures on a tasktracker of a given job after which new tasks of that job aren't assigned to it.



  mapred.task.timeout
  60000000
  The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string.


masters secondarynamenode: 这里只为测试只做在namenode本机上了
里面信息为 192.168.237.13

slaves:
里面信息为:
192.168.237.74
192.168.239.128

因为我机器中没配置JAVA_HOME环境变量,所以在hadoop-env.sh文件中进行设置
export JAVA_HOME=/usr/java/jdk1.6.0_22

----------------------------
#cd /datat/hadoop && su hadoop
$bin/hadoop namenode -format
$bin/start-all.sh
$ bin/hadoop dfsadmin -report
显示如下信息,为成功。
Configured Capacity: 107981234176 (100.57 GB)
Present Capacity: 101694681088 (94.71 GB)
DFS Remaining: 101694607360 (94.71 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Name: 192.168.239.128:50010
Decommission Status : Normal
Configured Capacity: 53558603776 (49.88 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 3143274496 (2.93 GB)
DFS Remaining: 50415292416(46.95 GB)
DFS Used%: 0%
DFS Remaining%: 94.13%
Last contact: Fri Aug 05 12:19:33 CST 2011


Name: 192.168.237.74:50010
Decommission Status : Normal
Configured Capacity: 54422630400 (50.69 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 3143278592 (2.93 GB)
DFS Remaining: 51279314944(47.76 GB)
DFS Used%: 0%
DFS Remaining%: 94.22%
Last contact: Fri Aug 05 12:19:33 CST 2011



安装过程中的一些错误:
1./data/hadoop 没有做chown  提示没权限
2./data/hadoop中手工创建了tmp data相关目录,提示
2011-08-05 09:40:34,559 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://hadoop-237-13:9000/data/hadoop/tmp/mapred/system
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /data/hadoop/tmp/mapred/system. Name node is in safe mode.

如果遇到错误,多查看hadoop_home/logs下来相关信息

参考信息:


阅读(1529) | 评论(1) | 转发(0) |
给主人留下些什么吧!~~

skybin0908042011-08-19 10:08:24

文章中漏了一个条件:安装hadoop前,要先安装JDK