Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1216038
  • 博文数量: 259
  • 博客积分: 10
  • 博客等级: 民兵
  • 技术积分: 2518
  • 用 户 组: 普通用户
  • 注册时间: 2012-10-13 16:12
个人简介

科技改变世界,技术改变人生。

文章分类

全部博文(259)

分类: HADOOP

2015-10-11 15:26:29

今天一台服务器 datanode服务自动停止了,查看datanode  log发现如下报错:

org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 1, volumes configured: 2, volumes failed: 1, volume failures tolerated: 0

意思是volumes出现故障,在hdfs-site.xml文件中有个配置:
<property>
<name>dfs.datanode.data.dir</name>
<value>/diskb/hadoop/hdfs/data,/diskc/hadoop/hdfs/data,/diskd/hadoop/hdfs/data</value>
</property>

<property>
        <name>dfs.datanode.failed.volumes.tolerated</name>
        <value>0</value>
</property>


dfs.datanode.failed.volumes.tolerated值为0,意思是当diska、diskb、diskc、diskd任何一块磁盘出现问题后,
datanode就会服务停止,如何设置为1,可以有一块故障。


#dmesg
出现大量I/O错误:
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 51 00 01 2a 00 00 00 30 00 00
__ratelimit: 8 callbacks suppressed
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 51 00 01 32 00 00 00 28 00 00
EXT4-fs error (device sdd1): __ext4_get_inode_loc: unable to read inode block - inode=2760773, block=706740260
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 51 00 01 2a 00 00 00 30 00 00
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 51 00 01 32 00 00 00 28 00 00
EXT4-fs error (device sdd1): __ext4_get_inode_loc: unable to read inode block - inode=2760802, block=706740262
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 51 00 01 2a 00 00 00 30 00 00
EXT4-fs error (device sdd1): __ext4_get_inode_loc: unable to read inode block - inode=2760734, block=706740257
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 11 80 01 2a 00 00 08 00
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 11 80 01 3a 00 00 10 00
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 11 80 01 52 00 00 08 00
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 11 80 01 62 00 00 08 00
sd 3:0:0:0: [sdd] Unhandled error code
sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 11 80 01 42 00 00 08 00
end_request: I/O error, dev sdd, sector 293601570
EXT4-fs error (device sdd1): __ext4_get_inode_loc: unable to read inode block - inode=143361, block=36700192
end_request: I/O error, dev sdd, sector 5750391074
EXT4-fs error (device sdd1): __ext4_get_inode_loc: unable to read inode block - inode=2807809, block=718798880
end_request: I/O error, dev sdd, sector 5653922090

尝试新建文件报错如下:
#touch 111
touch: cannot touch `111': Read-only file system

硬盘的健康状况:
smartctl -H /dev/sdd

注意
result后边的结果:PASSED,这表示硬盘健康状态良好
如果这里显示Failure,那么最好立刻给服务器更换硬盘


可以肯定是这块sdd硬盘出现问题,可以将此节点服务器,从hadoop群集中排除,
umount这块硬盘,之后更换个新的,重新格式化mount,再将服务器重新加入到hadoop群集中即可。

网上有些朋友说进行linux修复模式,fsck下硬盘,但是为了避免再出现问题,还是直接换个新的。



阅读(6563) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~