Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2813767
  • 博文数量: 587
  • 博客积分: 6356
  • 博客等级: 准将
  • 技术积分: 6410
  • 用 户 组: 普通用户
  • 注册时间: 2008-10-23 10:54
个人简介

器量大者,福泽必厚

文章分类

全部博文(587)

文章存档

2019年(3)

2018年(1)

2017年(29)

2016年(39)

2015年(66)

2014年(117)

2013年(136)

2012年(58)

2011年(34)

2010年(50)

2009年(38)

2008年(16)

分类: HADOOP

2012-10-27 21:53:07

hadoop学习过程中遇到的问题

1:12/10/27 21:11:00 INFO datanode.DataNode: Failed to start datanode org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.AvatarProtocol version mismatch. (client = 42, server = 41)

        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:452)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:419)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:471)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.handshake(AvatarDataNode.java:321)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.startDataNode(AvatarDataNode.java:179)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:233)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.(AvatarDataNode.java:119)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.makeInstance(AvatarDataNode.java:691)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.instantiateDataNode(AvatarDataNode.java:715)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.createDataNode(AvatarDataNode.java:720)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.main(AvatarDataNode.java:728)

12/10/27 21:11:01 ERROR datanode.AvatarDataNode: java.lang.IllegalArgumentException: not a proxy instance
        at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637)
        at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:481)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:655)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.shutdown(AvatarDataNode.java:576)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:236)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.(AvatarDataNode.java:119)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.makeInstance(AvatarDataNode.java:691)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.instantiateDataNode(AvatarDataNode.java:715)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.createDataNode(AvatarDataNode.java:720)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.main(AvatarDataNode.java:728)

12/10/27 21:11:01 INFO datanode.AvatarDataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down AvatarDataNode at ungeo12/127.0.0.1
************************************************************/
原因:  我在namenode上粗心启动了hadoop-0.20.3-dev,在datanode上启动了hadoop-0.20.1-dev,导致这个问题!

解决方法:使用统一的hadoop版本就可以了!

2:[root@ungeo11 ~]# /usr/local/hadoop/bin/hadoop  org.apache.hadoop.hdfs.server.datanode.AvatarDataNode
12/10/27 21:29:59 INFO datanode.AvatarDataNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting AvatarDataNode
STARTUP_MSG:   host = ungeo11/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.1-dev
STARTUP_MSG:   build =  -r ; compiled by 'root' on Wed Oct 17 12:40:23 CST 2012
************************************************************/
12/10/27 21:30:00 INFO datanode.DataNode: Failed to start datanode java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data: namenode namespaceID = 1036222900; datanode namespaceID = 523061301
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.startDataNode(AvatarDataNode.java:202)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:233)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.(AvatarDataNode.java:119)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.makeInstance(AvatarDataNode.java:691)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.instantiateDataNode(AvatarDataNode.java:715)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.createDataNode(AvatarDataNode.java:720)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.main(AvatarDataNode.java:728)

12/10/27 21:30:00 ERROR datanode.AvatarDataNode: java.lang.IllegalArgumentException: not a proxy instance
        at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637)
        at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:481)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:655)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.shutdown(AvatarDataNode.java:576)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:236)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.(AvatarDataNode.java:119)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.makeInstance(AvatarDataNode.java:691)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.instantiateDataNode(AvatarDataNode.java:715)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.createDataNode(AvatarDataNode.java:720)
        at org.apache.hadoop.hdfs.server.datanode.AvatarDataNode.main(AvatarDataNode.java:728)

12/10/27 21:30:00 INFO datanode.AvatarDataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down AvatarDataNode at ungeo11/127.0.0.1
************************************************************/

原因:namenode重新格式化! 而datanode上保存了原来格式化namespaceID ,将/tmp/hadoop-root目录删除,重新启动datanode即可





3:master node:

12/09/24 22:13:57 WARN mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Namenode has an edit log with timestamp of 2012-09-24 22:13:57 but new checkpoint was created using editlog with timestamp 2012-09-24 22:09:22. Checkpoint Aborted.

at org.apache.hadoop.hdfs.server.namenode.FSImage.validateCheckpointUpload(FSImage.java:1764)

at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:63)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)

backup node:

12/09/24 22:14:41 ERROR namenode.Checkpointer: Exception in doCheckpoint:

java.io.FileNotFoundException: http://ungeo8:50070/getimage?putimage=1&port=50105&machine=192.168.1.9&token=-24:1598601954:0:1348495762000:1348489955191

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

at sun.net.(HttpURLConnection.java:1368)

at java.security.AccessController.doPrivileged(Native Method)

at sun.net.(HttpURLConnection.java:1362)

at sun.net.(HttpURLConnection.java:1016)

at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:173)

at org.apache.hadoop.hdfs.server.namenode.Checkpointer.uploadCheckpoint(Checkpointer.java:206)

at org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:248)

at org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)

Caused by: java.io.FileNotFoundException: http://ungeo8:50070/getimage?putimage=1&port=50105&machine=192.168.1.9&token=-24:1598601954:0:1348495762000:1348489955191

at sun.net.(HttpURLConnection.java:1311)

at sun.net.(HttpURLConnection.java:2173)

at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:165)


故障原因:两个机器的时间不同步

解决方法:ntpdate与互联网时间同步即可


4: datanode上显示错误如下:

2012-10-02 11:28:27,641 INFO org.apache.hadoop.security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

2012-10-02 11:28:27,933 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.

at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:214)

at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:237)

at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440)

at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393)

at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407)

at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552)

故障原因:在hadoop下的core-site.xml和hdfs-site.xml 没有设置,重新设置即可


5:datanode上显示的错误如下:

2012-10-02 11:16:41,258 INFO org.apache.hadoop.security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

2012-10-02 11:16:53,695 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 0 time(s).

2012-10-02 11:16:57,810 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 1 time(s).

2012-10-02 11:17:01,816 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 2 time(s).

2012-10-02 11:17:05,819 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 3 time(s).

2012-10-02 11:17:09,821 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 4 time(s).

2012-10-02 11:17:13,824 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 5 time(s).

2012-10-02 11:17:17,827 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 6 time(s).

2012-10-02 11:17:21,830 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 7 time(s).

2012-10-02 11:17:25,835 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 8 time(s).

2012-10-02 11:17:29,838 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 9 time(s).

2012-10-02 11:17:32,869 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to /192.168.1.88:9000 failed on local exception: java.net.NoRouteToHostException: No route to host

故障原因:防火墙的问题(我在namenode上已经关闭了防火墙,如果不关闭,也有可能是namenode的防火墙问题),datanode上开启了防火墙,将防火墙关闭即可!


6[root@ungeo8 bin]# hadoop dfs -rmr helloabc

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

12/10/03 18:19:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

12/10/03 18:19:52 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

rmr: Cannot delete /user/root/helloabc. Name node is in safe mode.

[root@ungeo8 bin]#

解决方法:

[root@ungeo8 hadoop]# bin/hadoop dfsadmin -safemode leave

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

12/10/03 18:25:39 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

12/10/03 18:25:39 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

Safe mode is OFF

[root@ungeo8 hadoop]# bin/hadoop dfs -rmr helloabc

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

12/10/03 18:26:49 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

12/10/03 18:26:50 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

Deleted hdfs://0.0.0.0:9000/user/root/helloabc

[root@ungeo8 hadoop]#

bin/hadoop dfsadmin -safemode leave   ##绝对可以使用,已经测试过了!

bin/hadoop fsck /               ##这个命令有人说可以,但我运行该命令后,还是不能删除目录,暂时记录下了!


86365816, null, null) from 192.168.1.8:60093: error: java.io.IOException: File /user/root/testdir/jdk-6u21-linux-i586.bin could only be replicated to 0 nodes, instead of 1

java.io.IOException: File /user/root/testdir/jdk-6u21-linux-i586.bin could only be replicated to 0 nodes, instead of 1

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)

at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)

故障原因:在namenode上没有将datanode启动,没有一个datanode是运行状态的

解决方法:启动datanode即可


9[root@ungeo8 common]# /usr/local/hadoop/bin/hadoop dfs -put /home/xliu/mysql-5.1.65.tar.gz /user/root/testdir/

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

12/10/03 19:10:05 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

12/10/03 19:10:06 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

[root@ungeo8 common]#

在另外一个机器上查看该目录里面的文件

[root@ungeo9 bin]# ./hadoop dfs -D fs.default.name=hdfs://192.168.1.9:50100 -lsr / ##1.9为backup node

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

12/10/03 19:10:42 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS

12/10/03 19:10:44 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

12/10/03 19:10:48 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

drwxr-xr-x - root supergroup 0 2012-10-03 18:52 /user

drwxr-xr-x - root supergroup 0 2012-10-03 18:52 /user/root

drwxr-xr-x - root supergroup 0 2012-10-03 19:10 /user/root/testdir

-rw-r--r-- 3 root supergroup 101 2012-10-02 13:08 /user/root/testdir/NOTICE.txt

-rw-r--r-- 3 root supergroup 1366 2012-10-02 13:09 /user/root/testdir/README.txt

-rw-r--r-- 3 root supergroup 83854743 2012-10-03 18:52 /user/root/testdir/jdk-6u21-linux-i586.bin

-rw-r--r-- 3 root supergroup 0 2012-10-03 19:10 /user/root/testdir/mysql-5.1.65.tar.gz

在datanode上核查如下:


[root@ungeo11 ~]# cd /usr/local/hadoop

[root@ungeo11 hadoop]# ls
bin                                  hadoop-mapred-examples-0.21.0.jar
block                                hadoop-mapred-test-0.21.0.jar
c++                                  hadoop-mapred-tools-0.21.0.jar
common                               hdfs
conf                                 lib
hadoop-common-0.21.0.jar             LICENSE.txt
hadoop-common-test-0.21.0.jar        local
hadoop-hdfs-0.21.0.jar               logs
hadoop-hdfs-0.21.0-sources.jar       mapred
hadoop-hdfs-ant-0.21.0.jar           NOTICE.txt
hadoop-hdfs-test-0.21.0.jar          README.txt
hadoop-hdfs-test-0.21.0-sources.jar  tmp
hadoop-mapred-0.21.0.jar             webapps
hadoop-mapred-0.21.0-sources.jar
[root@ungeo11 hadoop]# du  -shc * 
92K     bin
81M     block
1.5M    c++
62M     common
76K     conf
1.3M    hadoop-common-0.21.0.jar
612K    hadoop-common-test-0.21.0.jar
920K    hadoop-hdfs-0.21.0.jar
604K    hadoop-hdfs-0.21.0-sources.jar
8.0K    hadoop-hdfs-ant-0.21.0.jar
676K    hadoop-hdfs-test-0.21.0.jar
416K    hadoop-hdfs-test-0.21.0-sources.jar
1.7M    hadoop-mapred-0.21.0.jar
1.2M    hadoop-mapred-0.21.0-sources.jar
252K    hadoop-mapred-examples-0.21.0.jar
1.5M    hadoop-mapred-test-0.21.0.jar
296K    hadoop-mapred-tools-0.21.0.jar
26M     hdfs
45M     lib
16K     LICENSE.txt
12K     local
472K    logs
57M     mapred
4.0K    NOTICE.txt
4.0K    README.txt
4.0K    tmp
116K    webapps
281M    total
[root@ungeo11 hadoop]# du  -shc * 
92K     bin
105M    block
1.5M    c++
62M     common
76K     conf
1.3M    hadoop-common-0.21.0.jar
612K    hadoop-common-test-0.21.0.jar
920K    hadoop-hdfs-0.21.0.jar
604K    hadoop-hdfs-0.21.0-sources.jar
8.0K    hadoop-hdfs-ant-0.21.0.jar
676K    hadoop-hdfs-test-0.21.0.jar
416K    hadoop-hdfs-test-0.21.0-sources.jar
1.7M    hadoop-mapred-0.21.0.jar
1.2M    hadoop-mapred-0.21.0-sources.jar
252K    hadoop-mapred-examples-0.21.0.jar
1.5M    hadoop-mapred-test-0.21.0.jar
296K    hadoop-mapred-tools-0.21.0.jar
26M     hdfs
45M     lib
16K     LICENSE.txt
12K     local
472K    logs
57M     mapred
4.0K    NOTICE.txt
4.0K    README.txt
4.0K    tmp
116K    webapps
304M    total

请注意红色字体部分的确别!

8:avatarnode的工作原理

·       AvatarNode方案

online节点在提供服务的时候,另一个影子节点作为热备。Onlinenamenode会把每一个操作都复制/同步到影子节点上。当两个节 点都完成后再返回给用户完成。当online节点正常时,影子节点处于safemode,不对外提供服务,只和online节点同步数据。当online 节点故障时,可以直接把VIP漂移到影子节点上。

这个方案的优点是基本上是热备。切换时间很短

缺点大概是同步的时间成本。对用户的操作的响应时间的影响。


AvatarDataNode基于Hadoop 0.20DataNodeAvatarDataNode需要发送block报告和block接受报告到两个AvatarNodeAvatarDataNode不使用VIPAvatarNode连接。(HDFS客户端通过VIP连接AvatarNode)





阅读(2754) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~