器量大者,福泽必厚
全部博文(587)
分类: HADOOP
2012-10-27 21:53:07
hadoop学习过程中遇到的问题
1:12/10/27 21:11:00 INFO datanode.DataNode: Failed to start datanode org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.AvatarProtocol version mismatch. (client = 42, server = 41)
原因:namenode重新格式化! 而datanode上保存了原来格式化namespaceID ,将/tmp/hadoop-root目录删除,重新启动datanode即可
3:master node:
12/09/24 22:13:57 WARN mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Namenode has an edit log with timestamp of 2012-09-24 22:13:57 but new checkpoint was created using editlog with timestamp 2012-09-24 22:09:22. Checkpoint Aborted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.validateCheckpointUpload(FSImage.java:1764)
at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:63)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
backup node:
12/09/24 22:14:41 ERROR namenode.Checkpointer: Exception in doCheckpoint:
java.io.FileNotFoundException: http://ungeo8:50070/getimage?putimage=1&port=50105&machine=192.168.1.9&token=-24:1598601954:0:1348495762000:1348489955191
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.(HttpURLConnection.java:1368)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.(HttpURLConnection.java:1362)
at sun.net.(HttpURLConnection.java:1016)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:173)
at org.apache.hadoop.hdfs.server.namenode.Checkpointer.uploadCheckpoint(Checkpointer.java:206)
at org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:248)
at org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)
Caused by: java.io.FileNotFoundException: http://ungeo8:50070/getimage?putimage=1&port=50105&machine=192.168.1.9&token=-24:1598601954:0:1348495762000:1348489955191
at sun.net.(HttpURLConnection.java:1311)
at sun.net.(HttpURLConnection.java:2173)
at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:165)
故障原因:两个机器的时间不同步
解决方法:ntpdate与互联网时间同步即可
4: datanode上显示错误如下:
2012-10-02 11:28:27,641 INFO org.apache.hadoop.security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
2012-10-02 11:28:27,933 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:214)
at org.apache.hadoop.hdfs.server.datanode.DataNode.
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552)
故障原因:在hadoop下的core-site.xml和hdfs-site.xml 没有设置,重新设置即可
5:datanode上显示的错误如下:
2012-10-02 11:16:41,258 INFO org.apache.hadoop.security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
2012-10-02 11:16:53,695 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 0 time(s).
2012-10-02 11:16:57,810 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 1 time(s).
2012-10-02 11:17:01,816 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 2 time(s).
2012-10-02 11:17:05,819 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 3 time(s).
2012-10-02 11:17:09,821 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 4 time(s).
2012-10-02 11:17:13,824 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 5 time(s).
2012-10-02 11:17:17,827 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 6 time(s).
2012-10-02 11:17:21,830 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 7 time(s).
2012-10-02 11:17:25,835 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 8 time(s).
2012-10-02 11:17:29,838 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /192.168.1.88:9000. Already tried 9 time(s).
2012-10-02 11:17:32,869 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to /192.168.1.88:9000 failed on local exception: java.net.NoRouteToHostException: No route to host
故障原因:防火墙的问题(我在namenode上已经关闭了防火墙,如果不关闭,也有可能是namenode的防火墙问题),datanode上开启了防火墙,将防火墙关闭即可!
6:[root@ungeo8 bin]# hadoop dfs -rmr helloabc
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
12/10/03 18:19:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/10/03 18:19:52 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
rmr: Cannot delete /user/root/helloabc. Name node is in safe mode.
[root@ungeo8 bin]#
解决方法:
[root@ungeo8 hadoop]# bin/hadoop dfsadmin -safemode leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
12/10/03 18:25:39 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/10/03 18:25:39 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Safe mode is OFF
[root@ungeo8 hadoop]# bin/hadoop dfs -rmr helloabc
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
12/10/03 18:26:49 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/10/03 18:26:50 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Deleted hdfs://0.0.0.0:9000/user/root/helloabc
[root@ungeo8 hadoop]#
bin/hadoop dfsadmin -safemode leave ##绝对可以使用,已经测试过了!
或
bin/hadoop fsck / ##这个命令有人说可以,但我运行该命令后,还是不能删除目录,暂时记录下了!
8:6365816, null, null) from 192.168.1.8:60093: error: java.io.IOException: File /user/root/testdir/jdk-6u21-linux-i586.bin could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /user/root/testdir/jdk-6u21-linux-i586.bin could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
故障原因:在namenode上没有将datanode启动,没有一个datanode是运行状态的
解决方法:启动datanode即可
9:[root@ungeo8 common]# /usr/local/hadoop/bin/hadoop dfs -put /home/xliu/mysql-5.1.65.tar.gz /user/root/testdir/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
12/10/03 19:10:05 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/10/03 19:10:06 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
[root@ungeo8 common]#
在另外一个机器上查看该目录里面的文件
[root@ungeo9 bin]# ./hadoop dfs -D fs.default.name=hdfs://192.168.1.9:50100 -lsr / ##1.9为backup node
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
12/10/03 19:10:42 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/10/03 19:10:44 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/10/03 19:10:48 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
drwxr-xr-x - root supergroup 0 2012-10-03 18:52 /user
drwxr-xr-x - root supergroup 0 2012-10-03 18:52 /user/root
drwxr-xr-x - root supergroup 0 2012-10-03 19:10 /user/root/testdir
-rw-r--r-- 3 root supergroup 101 2012-10-02 13:08 /user/root/testdir/NOTICE.txt
-rw-r--r-- 3 root supergroup 1366 2012-10-02 13:09 /user/root/testdir/README.txt
-rw-r--r-- 3 root supergroup 83854743 2012-10-03 18:52 /user/root/testdir/jdk-6u21-linux-i586.bin
-rw-r--r-- 3 root supergroup 0 2012-10-03 19:10 /user/root/testdir/mysql-5.1.65.tar.gz
在datanode上核查如下:
[root@ungeo11 ~]# cd /usr/local/hadoop
· AvatarNode方案
当online节点在提供服务的时候,另一个影子节点作为热备。Online的namenode会把每一个操作都复制/同步到影子节点上。当两个节 点都完成后再返回给用户完成。当online节点正常时,影子节点处于safemode,不对外提供服务,只和online节点同步数据。当online 节点故障时,可以直接把VIP漂移到影子节点上。
这个方案的优点是基本上是热备。切换时间很短
缺点大概是同步的时间成本。对用户的操作的响应时间的影响。
AvatarDataNode基于Hadoop 0.20中DataNode。AvatarDataNode需要发送block报告和block接受报告到两个AvatarNode。AvatarDataNode不使用VIP和AvatarNode连接。(HDFS客户端通过VIP连接AvatarNode)。