这里省略hadoop的介绍,直接介绍安装步骤,按照这步骤就能克隆搭建一个实例。
角色列表:
namenode & jobtracker 192.168.237.13
datanode & tasktracker 192.168.237.74
datanode & tasktracker 192.168.239.128
#useradd hadoop
download hadoop-0.20.2.tar.gz
#mkdir /data/hadoop
#tar -zxvf hadoop-0.20.2.tar.gz
#chown -R hadoop:hadoop hadoop-0.20.2 hadoop
解决无密码登录问题
#./ssh_nopasswd.sh client && ./ssh_nopasswd.sh server 按需修改用户和路径
ssh_nopasswd.zip ----------------------------
以下四个文件的配置,在一台机上编辑好后,传到其它机器上,面前重复编辑。
相关文件配置:
core-site.xml 配置namenode jobtracker基本信息
主要配置
fs.default.name:URI of NameNode
mapred.job.tracker:jobtracker ip 和 端口
hadoop.tmp.dir:hadoop临时目录
dfs.name.dir:name table存储路径
dfs.data.dir:namenode数据块配置
dfs.replication:副本数
PS:
我的host中进行了如下设置:
192.168.237.13 hadoop-237-13.pconline.ctc hadoop-237-13
192.168.237.74 hadoop-237-74.pconline.ctc hadoop-237-74
192.168.239.128 hadoop-239-128.pconline.ctc hadoop-239-128
例子:
fs.default.name
hdfs://hadoop-237-13:9000
The name of the default file system. Either the literal string "local" or a host:port for DFS.
mapred.job.tracker
192.168.237.13:9001
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and
reduce task.
hadoop.tmp.dir
/data/hadoop/tmp
A base for other temporary directories.
dfs.name.dir
/data/hadoop/filesystem/name
Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.data.dir
/data/hadoop/filesystem/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are i
gnored.
dfs.replication
2
Default block replication. The actual number of replications can be specified when the file is created. The default isused if replication is not specified in create time.
mapred-site.xml
配置map reduce 的一些细节信息
看description进行配置就行
mapred.job.tracker
192.168.237.13:9001
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
mapred.tasktracker.map.tasks.maximum
2
The maximum number of map tasks that will be run simultaneously by a task tracker.
mapred.tasktracker.reduce.tasks.maximum
2
The maximum number of reduce tasks that will be run simultaneously by a task tracker.
mapred.map.tasks
2
The default number of map tasks per job. Ignored when mapred.job.tracker is "local".
mapred.reduce.tasks
2
The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local".
mapred.userlog.retain.hours
2
The maximum time, in hours, for which the user-logs are to be retained.
mapred.child.java.opts
-Xmx700M -server
mapred.map.max.attempts
800
Expert: The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.
mapred.reduce.max.attempts
800
Expert: The maximum number of attempts per reduce task. In other words, framework will try to execute a reduce task these many number of times before giving up on it.
mapred.max.tracker.failures
800
The number of task-failures on a tasktracker of a given job after which new tasks of that job aren't assigned to it.
mapred.task.timeout
60000000
The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string.
masters secondarynamenode: 这里只为测试只做在namenode本机上了
里面信息为 192.168.237.13
slaves:
里面信息为:
192.168.237.74
192.168.239.128
因为我机器中没配置JAVA_HOME环境变量,所以在hadoop-env.sh文件中进行设置
export JAVA_HOME=/usr/java/jdk1.6.0_22
----------------------------
#cd /datat/hadoop && su hadoop
$bin/hadoop namenode -format
$bin/start-all.sh
$ bin/hadoop dfsadmin -report
显示如下信息,为成功。
Configured Capacity: 107981234176 (100.57 GB)
Present Capacity: 101694681088 (94.71 GB)
DFS Remaining: 101694607360 (94.71 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.239.128:50010
Decommission Status : Normal
Configured Capacity: 53558603776 (49.88 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 3143274496 (2.93 GB)
DFS Remaining: 50415292416(46.95 GB)
DFS Used%: 0%
DFS Remaining%: 94.13%
Last contact: Fri Aug 05 12:19:33 CST 2011
Name: 192.168.237.74:50010
Decommission Status : Normal
Configured Capacity: 54422630400 (50.69 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 3143278592 (2.93 GB)
DFS Remaining: 51279314944(47.76 GB)
DFS Used%: 0%
DFS Remaining%: 94.22%
Last contact: Fri Aug 05 12:19:33 CST 2011
安装过程中的一些错误:
1./data/hadoop 没有做chown 提示没权限
2./data/hadoop中手工创建了tmp data相关目录,提示
2011-08-05 09:40:34,559 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://hadoop-237-13:9000/data/hadoop/tmp/mapred/system
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /data/hadoop/tmp/mapred/system. Name node is in safe mode.
如果遇到错误,多查看hadoop_home/logs下来相关信息
参考信息:
阅读(1535) | 评论(1) | 转发(0) |