Smartd Error: 1 Currently unreadable (pending) sectors-x-fish-ChinaUnix博客

repeated until learnedfish.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

x-fish

博客访问： 525564
博文数量： 53
博客积分： 2265
博客等级：大尉
技术积分： 574
用户组：普通用户
注册时间： 2007-08-15 16:50

文章分类

全部博文（53）

misc（2）
Oracle（4）
Shell（4）
Exim（9）
Linux（18）
BSD（15）
未分配的博文（1）

文章存档

2019年（1）

2018年（2）

2016年（2）

2015年（1）

2014年（6）

2013年（5）

2012年（7）

2011年（16）

2010年（13）

我的朋友

相关博文

Smartd Error: 1 Currently unreadable (pending) sectors

分类： LINUX

2012-12-18 10:08:05

From: http://blog.secaserver.com/2011/08/smartd-error-1-unreadable-pending-sectors/

I am encountering following error in /var/log/messages:

Aug 15 03:55:42 hostname smartd[2366]: Device: /dev/sda, 1 Currently unreadable (pending) sectors

Which cause the / partition to be mounted as read-only. The server is accessible anyway but you cant do anything much inside. Lets troubleshoot this.

Collecting Information/Troubleshooting

I see read-only filesystem mounted when creating a test file in /root directory:

$ touch /root/testfile
touch: cannot touch `/root/testfile': Read-only file system

What is SMART daemon (smartd)?

Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests. We will use smartctl command to help us find out what is wrong with the disk.

Lets check the overall health of disk /dev/sda:

$ smartctl -H /dev/sda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

It passed. But it just general information only. We need to go deeper by do self-test to the disk:

$ smartctl -q errorsonly -H -l selftest -l error /dev/sda
ATA Error Count: 2
Error 2 occurred at disk power-on lifetime: 36795 hours (1533 days + 3 hours)
Error 1 occurred at disk power-on lifetime: 31542 hours (1314 days + 6 hours)

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 39255 -

When I Google up the error above, it seems like the hard disk might have hardware problem. FSCK only might not helping much since it only fix logical error in file system, not the hardware error.

Errors reported by SMARTD is related to power-on lifetime attributes which explain as below (reference):

Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. A decrease of this attribute value to the critical level (threshold) indicates a decrease of the MTBF (Mean Time Between Failures).
However, in reality, even if the MTBF value falls to zero, it does not mean that the MTBF resource is completely exhausted and the drive will not function normally.

Backup

Since the hard disk is in read-only mode, we better do backup before proceed with any problem solving process. In this case, SCP to another server is good idea because we cannot write to the local disk at this moment. For me, “home” partition is the most important folder need to be saved:

$ scp -r /home user1@remoteserver:/home/user1/home_backup

Problem Solving Process

1. Remount the / partition:

$ mount -n -o remount / mount: block device /dev/sda2 is write-protected, mounting read-only

2. Run e2fsck command to check ext3 file system online:

$ e2fsck /dev/sda2
e2fsck 1.39 (29-May-2006)
/: recovering journal
Clearing orphaned inode 31672817 (uid=0, gid=0, mode=0100755, size=157913)
Clearing orphaned inode 31672803 (uid=0, gid=0, mode=0100755, size=3532999)
Clearing orphaned inode 31666625 (uid=0, gid=0, mode=0100755, size=150604)
Clearing orphaned inode 31666619 (uid=0, gid=0, mode=0100755, size=383872)
Clearing orphaned inode 27885882 (uid=0, gid=0, mode=0100755, size=1011760)
Clearing orphaned inode 31666617 (uid=0, gid=0, mode=0100755, size=1141532)
Clearing orphaned inode 31665420 (uid=0, gid=0, mode=0100755, size=398180)
Clearing orphaned inode 31665416 (uid=0, gid=0, mode=0100755, size=71852)
Clearing orphaned inode 31671503 (uid=0, gid=0, mode=0100755, size=1250176)
/: clean, 80179/38273024 files, 2990728/38258797 blocks

Try remounting again the partition like step 1 but same error occurred. Proceed to next step.

3. Run full file system check using FSCK via rescue environment:

$ fsck -f -y /dev/sda2

Even the box remount correctly after that, the smartd status still haunting me up. This has force me to make final decision as my next step.

4. To avoid any sudden breakdown (since the disk already run more than 1000 days), I decided to replace the hard disk and re-install the box. Its better for me to do this as part of my maintenance task so I will not worrying much about ‘urgent’ maintenance when it breakdown during weekend or sleep time!

阅读(4792) | 评论(0) | 转发(0) |

上一篇：awk 正则表达式、正则运算符详细介绍

下一篇：linux文件系统变为只读的修复

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6