一、创建Hadoop用户
sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop
但是,所创建的用户还不具备admin权限,所以需要赋予admin权限。
sudo gedit /etc/sudoers
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
这样,就可以使用hadoop用户,并且拥有admin权限了。以下操作,都是在hadoop用户下进行的。
二、安装配置SSH
需要注意的是先关闭防火墙,sudo ufw disable
然后安装SSH:sudo apt-get install ssh
这个安装完后,可以直接使用ssh命令了。
执行$ netstat -nat 查看22端口是否开启了。
测试:ssh localhost。
输入当前用户的密码,回车就ok了。说明安装成功,同时ssh登录需要密码。
(这种默认安装方式完后,默认配置文件是在/etc/ssh/目录下。sshd配置文件是:/etc/ssh/sshd_config)
安装完成后,执行以下几步命令:
~$ cd /home/hadoop
~$ ssh-keygen -t rsa
然后一直回车,完成后,在home跟目录下会产生隐藏文件夹.ssh。
~$ cd .ssh
~$ ls
~$ cp id_rsa.pub authorized_keys
测试:
~$ ssh localhost
第一次ssh会有提示信息:
The authenticity of host ‘node1 (10.64.56.76)’ can’t be established.
RSA key fingerprint is 03:e0:30:cb:6e:13:a8:70:c9:7e:cf:ff:33:2a:67:30.
Are you sure you want to continue connecting (yes/no)?
输入 yes 来继续。这会把该服务器添加到你的已知主机的列表中
发现链接成功,并且无需密码。
三、安装Hadoop
去站点下载一个安装包,我下载的是hadoop-1.2.1.tar.gz
解压到/home/hadoop文件夹下:
tar -zxvf hadoop-1.2.1.tar.gz
生成一个hadoop-1.2.1文件夹,将其更名为hadoop
mv hadoop-1.2.1 hadoop
四、配置Hadoop
1.需要将Java环境变量写入到Hadoop用户的.bashrc文件中
HADOOP_HOME=/home/hadoop/hadoop-1.2.1
JAVA_HOME=/usr/local/jdk1.7.0_51
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/lib:$CLASSPATH
export HADOOP_HOME
export JAVA_HOME
export PATH
export CLASSPATH
HADOOP_HOME与JAVA_HOME的值根据实际情况,写好后生效并验证
Java采用上一篇文章的方法,hadoop使用如下命令:
echo $HADOOP_HOME
2. 在/home/hadoop/hadoop/conf/hadoop-env.sh文件中添加环境变量信息。
找到# The java implementation to use. Required.
export JAVA_HOME=/usr/local/jdk1.7.0_51
去注释,并改为如上的内容,JAVA_HOME根据实际情况
后面启动中如果出现ERROR:JAVA_HOME is not set这样的错误都是这一步的问题
3.配置core-site.xml、mapred-site.xml、hdfs-site.xml
首先建立一个用来存放数据的目录:mkdir /home/hadoop/hadoop-datastore
1)打开/home/hadoop/hadoop/conf/core-site.xml,配置如下:
-
<configuration>
-
<property>
-
<name>hadoop.tmp.dir</name>
-
<value>/home/hadoop/hadoop-datastore/</value>
-
<description>A base for other temporary directories.</description>
-
</property>
-
-
<property>
-
<!--fs.default.name指定NameNode的IP地址和端口号-->
-
<name>fs.default.name</name>
-
<value>hdfs://localhost:54310</value>
-
<description>The name of the default file system. A URI whose
-
scheme and authority determine the FileSystem implementation. The
-
uri's scheme determines the config property (fs.SCHEME.impl) naming
-
the FileSystem implementation class. The uri's authority is used to
-
determine the host, port, etc. for a filesystem.</description>
-
</property>:
2)打开/home/hadoop/hadoop/conf/mapred-site.xml,配置如下:
-
<configuration>
-
<property>
-
<name>mapred.job.tracker</name>
-
<value>localhost:54311</value>
-
<description>The host and port that the MapReduce job tracker runs
-
at. If "local", then jobs are run in-process as a single map
-
and reduce task.
-
</description>
-
</property>
-
</configuration>
3)打开/home/hadoop/hadoop/conf/hdfs-site.xml,配置如下:
-
<configuration>
-
<property>
-
<!--block的副本数,默认为3;你可以设置为1 这样每个block只会存在一份。-->
-
<name>dfs.replication</name>
-
<value>1</value>
-
<description>Default block replication.
-
The actual number of replications can be specified when the file is created.
-
The default is used if replication is not specified in create time.
-
</description>
-
</property>
-
</configuration>
这样就配置完毕
五、启动Hadoop
1. 格式化HDFS:
首先建立一个用来存放数据的目录:mkdir /home/hadoop/hadoop-datastore
进入hadoop/bin目录:~$ hadoop namenode -format
成功情况下,系统将输出如下类似信息
出现诸如:has been successfully formatted 这样的信息,说明已格式化成功
注意:格式化只可以在最初安装时使用,使用过程中不可以再进行此步操作,除非重做hdfs文件系统
2.启动HDFS:
cd /home/hadoop/hadoop/bin
运行启动脚本:./start_all.sh
停止脚本为:./stop_all.sh
启动后通过jps查看进程是否启动成功:~/hadoop/bin$ jps
7180 TaskTracker
7029 JobTracker
6615 NameNode
7236 Jps
6791 DataNode
6939 SecondaryNameNode
出现如上信息,表示hadoop启动成功,这六个进程缺一不可。
netstat -at|grep 50030
netstat -at|grep 50070
查看端口是否正常
注意:有时候可能启动不成功,可以在/home/hadoop/hadoop/logs/查看日志信息进行诊断。
访问可以看到NameNode以及整个分布式文件系统的状态,浏览分布式文件系统中的文件以及日志等。
访问可以查看JobTracker的运行状态。
50070是dfs的端口,50030是MR的端口。
六、测试并运行Hadoop的wordcount程序
1)~$ hadoop fs -mkdir input
若是出现 mkdir: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hadoop/input. Name node is in safe mode.这样的错误
说明Hadoop的NameNode处在安全模式下。
在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式的情况下,文件系统中的内容不允许修改也不允许删除,直到安全模式结
束。安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。运行期通过命令也可以进入
安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。
只需键入命令: ~/hadoop/bin$ hadoop dfsadmin -safemode leave
出现:Safe mode is OFF 表明hdfs已离开了安全模式,可以进入下一步操作
2)~/hadoop/bin$ hadoop fs -ls
输出:Found 1 items
drwxr-xr-x - hadoop supergroup 0 2014-03-29 15:20 /user/hadoop/input
3)cd .. (回到上一层目录)
~/hadoop$ hadoop fs -put NOTICE.txt README.txt input
再敲命令: ~/hadoop$ hadoop fs -ls input
输出:Found 2 items
-rw-r--r-- 1 hadoop supergroup 101 2014-03-29 15:22 /user/hadoop/input/NOTICE.txt
-rw-r--r-- 1 hadoop supergroup 1366 2014-03-29 15:22 /user/hadoop/input/README.txt
说明预统计次数的两个文件确实已放到input目录下
4)运行程序:~/hadoop$ hadoop jar hadoop-examples-1.2.1.jar wordcount input output
程序输出如下:
-
14/03/29 15:23:42 INFO input.FileInputFormat: Total input paths to process : 2
-
14/03/29 15:23:42 INFO util.NativeCodeLoader: Loaded the native-hadoop library
-
14/03/29 15:23:42 WARN snappy.LoadSnappy: Snappy native library not loaded
-
14/03/29 15:23:43 INFO mapred.JobClient: Running job: job_201403291516_0001
-
14/03/29 15:23:44 INFO mapred.JobClient: map 0% reduce 0%
-
14/03/29 15:23:53 INFO mapred.JobClient: map 50% reduce 0%
-
14/03/29 15:23:54 INFO mapred.JobClient: map 100% reduce 0%
-
14/03/29 15:24:02 INFO mapred.JobClient: map 100% reduce 33%
-
14/03/29 15:24:04 INFO mapred.JobClient: map 100% reduce 100%
-
14/03/29 15:24:06 INFO mapred.JobClient: Job complete: job_201403291516_0001
-
14/03/29 15:24:06 INFO mapred.JobClient: Counters: 29
-
14/03/29 15:24:06 INFO mapred.JobClient: Job Counters
-
14/03/29 15:24:06 INFO mapred.JobClient: Launched reduce tasks=1
-
14/03/29 15:24:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=14627
-
14/03/29 15:24:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
-
14/03/29 15:24:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
-
14/03/29 15:24:06 INFO mapred.JobClient: Launched map tasks=2
-
14/03/29 15:24:06 INFO mapred.JobClient: Data-local map tasks=2
-
14/03/29 15:24:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10325
-
14/03/29 15:24:06 INFO mapred.JobClient: File Output Format Counters
-
14/03/29 15:24:06 INFO mapred.JobClient: Bytes Written=1356
-
14/03/29 15:24:06 INFO mapred.JobClient: FileSystemCounters
-
14/03/29 15:24:06 INFO mapred.JobClient: FILE_BYTES_READ=2003
-
14/03/29 15:24:06 INFO mapred.JobClient: HDFS_BYTES_READ=1699
-
14/03/29 15:24:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=180692
-
14/03/29 15:24:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1356
-
14/03/29 15:24:06 INFO mapred.JobClient: File Input Format Counters
-
14/03/29 15:24:06 INFO mapred.JobClient: Bytes Read=1467
-
14/03/29 15:24:06 INFO mapred.JobClient: Map-Reduce Framework
-
14/03/29 15:24:06 INFO mapred.JobClient: Map output materialized bytes=2009
-
14/03/29 15:24:06 INFO mapred.JobClient: Map input records=33
-
14/03/29 15:24:06 INFO mapred.JobClient: Reduce shuffle bytes=2009
-
14/03/29 15:24:06 INFO mapred.JobClient: Spilled Records=284
-
14/03/29 15:24:06 INFO mapred.JobClient: Map output bytes=2200
-
14/03/29 15:24:06 INFO mapred.JobClient: Total committed heap usage (bytes)=321388544
-
14/03/29 15:24:06 INFO mapred.JobClient: CPU time spent (ms)=2520
-
14/03/29 15:24:06 INFO mapred.JobClient: Combine input records=190
-
14/03/29 15:24:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=232
-
14/03/29 15:24:06 INFO mapred.JobClient: Reduce input records=142
-
14/03/29 15:24:06 INFO mapred.JobClient: Reduce input groups=134
-
14/03/29 15:24:06 INFO mapred.JobClient: Combine output records=142
-
14/03/29 15:24:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=441929728
-
14/03/29 15:24:06 INFO mapred.JobClient: Reduce output records=134
-
14/03/29 15:24:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2018881536
-
14/03/29 15:24:06 INFO mapred.JobClient: Map output records=190
5)运行完成后查看目录::~/hadoop$ hadoop fs -ls output
输出:Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2014-03-29 15:24 /user/hadoop/output/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-03-29 15:23 /user/hadoop/output/_logs
-rw-r--r-- 1 hadoop supergroup 1356 2014-03-29 15:24 /user/hadoop/output/part-r-00000
说明结果确实已生成
查看具体结果:~/hadoop$ hadoop fs -cat output/part-r-00000
输出统计字数信息:
-
(BIS), 1
-
(ECCN) 1
-
(TSU) 1
-
(http://www.apache.org/). 1
-
(see 1
-
5D002.C.1, 1
-
740.13) 1
-
<http://www.wassenaar.org/> 1
-
Administration 1
-
Apache 2
-
BEFORE 1
-
BIS 1
-
Bureau 1
-
Commerce, 1
-
Commodity 1
-
Control 1
-
Core 1
-
Department 1
-
ENC 1
-
Exception 1
-
Export 2
-
For 1
-
Foundation 2
-
Government 1
-
Hadoop 1
-
Hadoop, 1
-
Industry 1
-
Jetty 1
-
License 1
-
Number 1
-
Regulations, 1
-
SSL 1
-
Section 1
-
Security 1
-
See 1
-
Software 3
-
Technology 1
-
The 5
-
This 2
-
U.S. 1
-
Unrestricted 1
-
about 1
-
algorithms. 1
-
and 6
-
and/or 1
-
another 1
-
any 1
-
as 1
-
asymmetric 1
-
at: 2
-
both 1
-
by 2
-
check 1
-
classified 1
-
code 1
-
code. 1
-
concerning 1
-
country 1
-
country's 1
-
country, 1
-
cryptographic 3
-
currently 1
-
details 1
-
developed 1
-
distribution 2
-
eligible 1
-
encryption 3
-
exception 1
-
export 1
-
following 1
-
for 3
-
form 1
-
from 1
-
functions 1
-
has 1
-
have 1
-
http://hadoop.apache.org/core/ 1
-
http://wiki.apache.org/hadoop/ 1
-
if 1
-
import, 2
-
in 1
-
included 1
-
includes 3
-
information 2
-
information. 1
-
is 1
-
it 1
-
latest 1
-
laws, 1
-
libraries 1
-
makes 1
-
manner 1
-
may 1
-
more 2
-
mortbay.org. 1
-
object 1
-
of 5
-
on 2
-
or 2
-
our 2
-
performing 1
-
permitted. 1
-
please 2
-
policies 1
-
possession, 2
-
product 1
-
project 1
-
provides 1
-
re-export 2
-
regulations 1
-
reside 1
-
restrictions 1
-
security 1
-
see 1
-
software 3
-
software, 2
-
software. 2
-
software: 1
-
source 1
-
the 8
-
this 3
-
to 2
-
under 1
-
use, 2
-
uses 1
-
using 2
-
visit 1
-
website 1
-
which 2
-
wiki, 1
-
with 1
-
written 1
-
you 1
-
your 1
说明Hadoop已正确安装并运行成功。
下一篇将讨论配置eclipse下的Hadoop开发环境。
阅读(2127) | 评论(0) | 转发(0) |