在新安装的hadoop cdh4.1.1上安装测试mahout的时候,遇到了一个问题,这里将它列于下面以供大家参考。
因为MRV2测试后,还存在一些不稳定的因素,所以将本地集群重新安装配置成了MRV1这个版本,由于不是按官方文档中所说的那样像一般软件通过apt-get或yum安装的方式,而是通过tar包的方法进行安装,所以会遇到一些其它的问题。
在将mahout解压之后,将其解压路径加入环境变量中,然后执行下面的一条命令,测试fpg算法是否可以在集群上正确运行,然后出现下面的问题:
- #fpg算法执行命令
- mahout fpg -i retail.dat -o patterns -k 50 -method mapreduce -regex '[\ ]' -s 2
- Running on hadoop, using /opt/cloudera/cdh4.1.1/hadoop-2.0.0-cdh4.1.1/bin/hadoop and HADOOP_CONF_DIR=
- MAHOUT-JOB: /opt/cloudera/cdh4.1.1/mahout-0.7-cdh4.1.1/mahout-examples-0.7-cdh4.1.1-job.jar
- 12/11/19 10:13:36 INFO common.AbstractJob: Command line arguments: {--encoding=[UTF-8], --endPhase=[2147483647], --input=[retail.dat], --maxHeapSize=[50], --method=[mapreduce], --minSupport=[2], --numGroups=[1000], --numTreeCacheEntries=[5], --output=[patterns], --splitterPattern=[[\ ]], --startPhase=[0], --tempDir=[temp]}
- 12/11/19 10:13:36 WARN conf.Configuration: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
- 12/11/19 10:13:36 WARN conf.Configuration: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
- 12/11/19 10:13:36 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "hadoop-01:9001"
- 12/11/19 10:13:36 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
- Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
- at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
- at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
- at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
- at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1186)
- at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1182)
- at java.security.AccessController.doPrivileged(Native Method)
- at javax.security.auth.Subject.doAs(Subject.java:396)
- at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
- at org.apache.hadoop.mapreduce.Job.connect(Job.java:1181)
- at org.apache.hadoop.mapreduce.Job.submit(Job.java:1210)
- at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1234)
- at org.apache.mahout.fpm.pfpgrowth.PFPGrowth.startParallelCounting(PFPGrowth.java:313)
- at org.apache.mahout.fpm.pfpgrowth.PFPGrowth.runPFPGrowth(PFPGrowth.java:230)
- at org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.run(FPGrowthDriver.java:136)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
- at org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver.main(FPGrowthDriver.java:56)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
- at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
- at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
通过上面的错误日志可以看出,有两点值得注意:第一,
- Running on hadoop, using /opt/cloudera/cdh4.1.1/hadoop-2.0.0-cdh4.1.1/bin/hadoop and HADOOP_CONF_DIR=
第二,
- 12/11/19 10:13:36 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "hadoop-01:9001"
- 12/11/19 10:13:36 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
- Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
然后我做了以下两个尝试,首先将mr1下面的core-site.xml,hdfs-core.xml,mapred-site.xml这三个配置文件放在mahout目录下的conf中,然后执行命令,发现还是出现同样的问题,运行mahout fpg算法的mapreduce模式自然是要运行MR作业,然而通过第一个注意点可以看出,它直接使用了hadoop下面的hadoop运行,而没有使用MRV1下的,所以这也可能是执行出错的原因,另外,
mapreduce.framework.name这个属性是MRv2下才需要配置的,在版本1下不需要,所以就很自然地找到了问题的所在,MRv2和hadoop本身整合在了一起,而MRv1和hadoop还是分开的,所以查看了下/etc/profile,没有MRv1的环境变量存在,所以第二个尝试就是通过vim /etc/profile添加以下两行:
- export HADOOP_HOME=/opt/cloudera/cdh4.1.1/hadoop-2.0.0-mr1-cdh4.1.1
- export PATH=$HADOOP_HOME/bin:$PATH
接着source /etc/profile,重新执行第一个命令,然后顺利运行通过得到了预期的结果。
阅读(5225) | 评论(1) | 转发(0) |