Chinaunix首页 | 论坛 | 博客
  • 博客访问: 187795
  • 博文数量: 51
  • 博客积分: 689
  • 博客等级: 上士
  • 技术积分: 525
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-03 13:05
文章分类

全部博文(51)

文章存档

2014年(1)

2013年(2)

2012年(7)

2011年(11)

2010年(30)

分类:

2010-08-23 17:43:56

Smartmontools for SCSI devices
Douglas Gilbert


       <>

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

For an online copy of the license see .

2006-11-21

Revision History
Revision 1.62006-11-21dpg
auto '-d sat', background scan, windows device names
Revision 1.52006-06-24dpg
device type 'sat'
Revision 1.42006-05-08dpg
5.38 update, SATA, SAS
Revision 1.32004-09-25dpg
error counter descriptions, error events log page
Revision 1.22004-05-27dpg
reorganise, details in appendix, version 5.31
Revision 1.12003-10-13dpg
freebsd, timestamp
Revision 1.02003-05-26dpg
first cut

Abstract

This article describes how smartmontools interacts with SCSI storage devices (mainly hard disks and tape drives). Smartmontools is a SMART utility toolset. SMART is an acronym for Self-Monitoring, Analysis and Reporting Technology. Smartmontools is available for the these operating systems: Darwin (Mac OS X but with no SCSI support yet), FreeBSD, Linux, NetBSD, OpenBSD, OS/2 (no SCSI support), Solaris and Windows.

Table of Contents

Introduction

Smartmontools controls and monitors storage devices using the Self-Monitoring, Analysis and Reporting Technology (SMART) system. This toolset was originally built for the Linux operating system and has been ported to Darwin for Mac OS X (no SCSI support yet), FreeBSD, NetBSD, OpenBSD, OS/2 (no SCSI support), Solaris and Windows. This article describes how smartmontools interacts with SCSI devices. Passing reference is also made to devices that use the SCSI command set such as USB mass storage devices and IEEE1394 devices that use the "sbp2" protocol. In many situations SATA disks are accessed using a (partial) SCSI command set.

The primary web site for smartmontools is at from which the latest versions (both source and binaries) can be obtained. Smartmontools grew out of the now dormant smartsuite project which is still available on its own sourceforge site. The smartmontools main page concentrates on ATA devices. This article supplies some SCSI specific information for those users of smartmontools that wish to monitor SCSI storage devices.

This document outlines the features found in smartmontools version 5.37 that are relevant to SCSI disks and tape drives. This document was last altered on 21st November 2006.

Overview of Smartmontools

Smartmontools is made up of two executable programs, a configuration file and online documentation (on Unix systems in the form of "man" pages). The two executable programs are:

  • smartctl: a command line utility

  • smartd: a daemon program providing a monitoring service

SCSI disks and tape drives allow self tests of their media, often monitor the temperature of the device, maintain error counters and report when various failure prediction thresholds are exceeded. To view the information available try a command like: smartctl -a /dev/sda. If SMART reporting has not been turned on for this disk then use this command first: smartctl -s on /dev/sda. [For operating systems other than Linux replace /dev/sda with a SCSI disk device name.]

The smartd daemon program is a service typically started when a machine boots up. In can monitor multiple disks (both ATA and SCSI). In Unix systems its configuration file can be found /etc/smartd.conf. It sends alerts to the system logs and can be configured to email system administrators when pending failures are reported.

If smartmontools detects some "bad blocks" then the reader should look at this page: .

Operating Systems

Smartmontools was originally written for Linux. Since then it has been ported to various other Unix based systems and Windows. Note that the device names are based on the transport that an operating system sees. These days it is not uncommon for an operating system to see a transport that only conveys SCSI commands connected, via some command translation bridge, to an ATA disk. Examples are USB external disk enclosures and SATA disks behind a SCSI to ATA Translation Layer (SATL) in a SAS or FC domain.

The names of SCSI disk and tape devices vary with the operating system. Here is a summary:

Table 1. SCSI device names in various systems

 diskstapesNotes
Linux/dev/sd[a-z]/dev/[n]st[0-9] 
FreeBSD/dev/da[0-9]/dev/[n|e]sa[0-9] 
NetBSD/dev/sd[0-9]+c/dev/st[0-9]+c 
OpenBSD/dev/sd[0-9]+c/dev/st[0-9]+c 
Solaris/dev/rdsk/c?t?d?s?/dev/rmt/* 
Windows/dev/scsi[0-9][0-f]/dev/scsi[0-9][0-f]ASPI adapter:0-9, ID:0-15
 /dev/sd[a-z] for '\\.\PhysicalDrive[0-25]'
 /dev/pd[0-255] for '\\.\PhysicalDrive[0-255]'
  /dev/tape[0-255]for '\\.\Tape[0-255]'
Darwin  no support for SCSI devices
OS/2  no support for SCSI devices


The above list is a simplification. In Linux there can be multiple drive letters followed by a partition number (1 to 15). Smartmontools will ignore the partition number if it is given and query the underlying device. In Linux the SCSI tape device name can be "nst" and a letter can be appended to the device name, both decorations are ignored by smartmontools as it accesses the underlying tape drive. Also in Linux, SCSI devices can be accessed via their generic name which is of the form /dev/sg[0-9].

Linux also has an optional Solaris like naming scheme for SCSI device (scsidev), devfs (mainly used in the lk 2.4 series) and udev (devfs's replacement in the lk 2.6 series). In short, device naming is a complex area and smartmontools does its best to find and identify (i.e. whether ATA or SCSI) a device depending on its name. In some cases smartmontools needs guidance from the user and this can be given by the '-d ata|scsi|sat|marvell|3ware,N' option in the smartctl utility and in smartd daemon's configuration file.

Windows has several schemes for naming devices. The "scsi[0-9][0-f]" scheme uses the aspi dll from Adaptec. That dll is not distributed with Windows. The other schemes use the "SCSI Pass Through" interface which is native to Windows in NT and later. In all cases for Windows, the leading /dev/ is optional.

SCSI disks

What is a SCSI disk? A SCSI disk is a storage device that "talks" the SCSI command set. An ATA disk is a storage device that "talks" the ATA command set. That seems pretty clear. However the command set that a disk uses at its connector (and thus shown on its label) may not be the command set that the operating system needs to use due to command set translation between the OS and the disk.

The ATA command set is used over native ATA transports which are parallel ATA (PATA) up to 133 MB/sec and serial ATA (SATA) at link speeds of 1.5 Gbps (approximately 150 MB/sec) or 3 Gbps. In the past when ATA disks needed to use some other transport (e.g. USB and IEEE1394) the SCSI command set was sent over the foreign transport. So in this case the operating system sees a device "talking" the SCSI command set but the device is really an ATA disk. Many current disk external enclosures contains ATA disks yet seen from the operating systems view point are USB mass storage devices talking the SCSI command set.

The SCSI command set is used over various transports: the SCSI Parallel Interface (SPI), Fibre Channel (FCP), Serial Attached SCSI (SAS), IEEE1394 (SBP), USB (mass storage) and iSCSI. Many of these transports can convey multiple command sets (i.e. not just the SCSI command set). The SAS transport is interesting as it can convey both the SCSI and ATA command sets. There is also the case of a RAID made up of ATA disks which communicates to host operating system with the SCSI command set (e.g. 3ware RAID controller).

So what does all this mean for smartmontools? In most cases the answer is not good news. Devices such as USB external disk enclosures translate incoming (from the host) SCSI commands to their ATA equivalents and process responses as required. This translation is limited typically to a small number of SCSI commands (e.g. READ and WRITE) but not those commands needed by smartmontools. The author does not know of any SCSI_over_USB devices that support Smartmontools. The 3ware RAID (6000, 7000, 8000 and 9000 series Escalade) controllers are supported on several operating systems with special code. []

There is an emerging SCSI to ATA Translation (SAT) standard at that may lead to improvements in this area. Apart from defining some of the facilities smartmontools needs, it defines two ATA PASS THROUGH SCSI commands. These pass through commands could be used in much the same way that the 3ware RAID tunnels ATA commands.

The device type '-d sat' instructs the smartctl command and the smartd daemon, to form SMART commands for the ATA command set and then package those commands within the ATA PASS THROUGH SCSI commands. The SCSI commands are then sent to the "SCSI" device that the operating system has been given. In version 5.37 of smartmontools it is no longer necessary to specify '-d sat' in this situation. All that is needed is a SATL that complies with the emerging SAT standard. If the automatic detection of an ATA disk behind a SATL is tricked, '-d scsi' (or some other device type) can be used to override.

It has been reported that many external USB enclosures use a "Cypress" chipset. This contains an ATACB proprietary pass through (for ATA commands passed through SCSI commands) for which some publicly available information is available. Smartmontools has no ATACB specific code but may move in this direction in the future. Another approach is to hope USB and SBP2 external enclosures adopt the SAT standard in the near future. One interesting comment about ATACB is that it should not be used at the same time as other types of access to the disk (e.g. a mounted file system)! That implies that a disk should be taken offline before smartmontools is used on it. It also implies that the smartd background daemon should not be used.

SATA disks

SATA disks use a 1.5 or 3 Gbps serial transport which carries the ATA command set. The serial connection is point to point so each SATA disk needs its own cable and plug on the host adapter or motherboard. [] Many aspects of SATA are like SCSI and some operating systems use existing SCSI infrastructure to handle SATA hosts (e.g. Linux's libata).

Serial Attached SCSI (SAS) can be viewed as a superset of SATA. It can directly connect thousands of SAS disks to one or more controllers spread across multiple machines in one SAS "domain". Such a domain can also contain SATA disks, connected to intermediate fanout devices called expanders (similar to switches in networking). Most SAS host adapters can also have SATA disks connected directly to the adapter (which technically is not a usage of SAS but that is of little concern to the end user).

So a SATA disk may be connected

  • to a SATA host controller (on a motherboard or an adapter)

  • directly to a SAS host adapter

  • to a SAS expander which is connected to one or more SAS host adapters

  • or connected via a bridge which is connected to the host computer via some other transport (e.g. fibre channel)

Since all but the first item might have other disks connected which use the SCSI command set (e.g. SAS and FC disks) often the SATA disks have a SAT layer put in front of them so they look like SCSI disks. That SAT layer may be in:

  • the operating system kernel (e.g. libata in Linux)

  • in the host adapter firmware (or RAID controller)

  • or external to the host computer: within a disk enclosure (e.g. associated with a SAS expander)

For normal file system work, a SCSI to ATA Translation Layer (SATL) only needs to concern itself with around 6 commands. Unfortunately smartmontools uses other commands (both in the SCSI and ATA command sets). Probably the simplest way to handle SMART for SATA disks behind a SAT layer is to use the ATA PASS THROUGH SCSI commands.

smartmontools guesses the disk command set (i.e. ATA or SCSI) based on the device node it is given. For example in Linux, /dev/hda would be assumed to use the ATA command set while /dev/sda would be assumed to use the SCSI command set. [] By using either the '-d ata' or the '-d scsi' option, the command set guess made by smartmontools can be overridden. The '-d sat' device type causes smartmontools to generate ATA commands which are then packaged within the ATA PASS THROUGH SCSI commands (defined by the SAT standard) and then sent to the device via a SCSI pass through mechanism. As noted in the previous section, version 5.37 of smartmontools now automatically detects a SATA disk behind a SAT layer and acts as if '-d sat' has been given.

SMART

SMART never attained the status of "standard" and its original documents have been withdrawn. Its catchy name lives on, especially on vendors' web sites and obviously in the name of this toolset. Luckily the good ideas in SMART have been incorporated into the ATA and SCSI standards albeit in slightly different forms.

Initially SMART began on SCSI disks as vendor specific extensions. Gradually the SMART functionality has moved into the standards (often by other names) and vendors are improving their standards' compliance. [In the vendors' defence some of the "standards" are drafts and are yet to be ratified.] Some SCSI disk vendors have product manuals (available on the net) that cover the parts of the SCSI command set that their disks support. Some of these manuals fill in details that are left deliberately vague in the the standards. []

SCSI standards (found at ) only make one footnote reference to the term SMART. In its place the awkward term "Informational Exceptions" is used. For SCSI tapes the term "TapeAlert" is used.

smartctl command line utility

The smartctl command line utility gets SMART information from the nominated device. In some cases SMART information held by the nominated device can be modified by the smartctl command. The command has many options that can be viewed by the long usage message output be either of these invocations: smartctl -h or smartctl --help. Those options that are only available to ATA disks (i.e. not available to SCSI disks or tape drives) are marked with "(ATA)". Unix style "man" page documentation is also available.

The following options are currently available for SCSI disks and tape drives unless otherwise noted:

  • -a | --all: equivalent to the combination -i -H -A -l error -l selftest options invoked in that order.

  • -A | --attributes: outputs the current device temperature, trip temperature, the number of elements in the grown defect list (GLIST) and data from the start-stop log page. Outputs some vendor specific information if available.

  • -C | --captive: used in conjunction with -t short or -t long options to do short or long self tests in the foreground. [Has no effect on tape drives.]

  • -d TYPE | --device=TYPE where TYPE is "ata", "scsi", "sat", "marvell", "3ware,N", "hpt,L/N[,M]" or "cciss,N". Overrides utility's guess about the class of the device which is based on the form of the nominated device's name.

  • -h | --help: outputs lengthy usage message and exits without any other action.

  • -H | --health: outputs single device health metric determined by the device manufacturer. This will be "OK" or a failure message.

  • -i | --info: outputs device identification information (derived from a SCSI INQUIRY command) and whether the device supports SMART (and temperature warnings) and if those facilities are currently enabled. The type of transport (e.g. FC or SAS) is also reported, if available. Some users have reported disks that report the wrong transport.

  • -l TYPE | --log=TYPE where TYPE is either "background", "selftest" or "error". Decodes are outputs the requested log. Note that --all does not include --log=background .

  • -q TYPE | --quietmode=TYPE where TYPE is either "silent" or "errorsonly". When the type is silent then nothing is output to the console but the exit status is set (so it is suitable for scripts). For "errorsonly" only errors are output to the console. The exit status is always set. [See the smartctl man page.]

  • -r TYPE | --report=TYPE where TYPE is either "ioctl[,]" or "scsiioctl[,]". Turns on low level debugging of issued commands and responses. These commands are issued through a system command called an "ioctl" in Unix. The debug can be for all issued commands (i.e. "ioctl") or only SCSI commands ("scsiioctl"). Optionally the TYPE can have a comma and a number post pended to increase the volume of debug. See this for more details.

  • -s VALUE | --smart=VALUE where VALUE is either "on" or "off". Enables or disables SMART monitoring (and temperature warnings).

  • -S VALUE | --saveauto=VALUE where VALUE is either "on" or "off". Controls whether the error log values are preserved across device power cycles.

  • -t TEST | --test=TEST where TEST is either "offline", "short" or "long". Despite its name "offline" is a short foreground test that all SCSI devices should support. A "short" self test is typically 2 minutes or less. A "long" self test will be considerably longer than 2 minutes, depending on the size of the media. The estimated time that a "long" self test will take is printed after the "selftest" log (i.e. with '-l selftest' or '-a')

  • -V | --version: outputs the smartctl version number (including the cvs version of all its source files) and build information then exits without any other action.

  • -X | --abort: will terminate a background short or long self test. Usually the self test log notes that a self test has been aborted. [Has no effect on tape drives.]

After the options smartctl expects a device name. This device name is not required for the '--help' or '--version' options. If no options are given and a valid device name is given then the copyright notice is output and the program exits. If the device name is invalid then that is reported. Only one device name can be given.

Examples of various invocations of smartctl on a SCSI disk follow:

# smartctl -i /dev/sdc
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is

Device: SEAGATE ST336754SS Version: 0003
Serial number: xxxxxxxx
Device type: disk
Transport protocol: SAS
Local Time is: Fri Apr 28 15:55:34 2006 EDT
Device supports SMART and is Enabled
Temperature Warning Enabled

# smartctl -H /dev/sdd
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is

SMART Health Status: O

# smartctl -A /dev/sdc
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is

Current Drive Temperature: 42 C
Drive Trip Temperature: 68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 1666124337
Blocks received from initiator = 1517744621
Blocks read from cache and sent to initiator = 384030649
Number of read and write commands whose size <= segment size = 21193148
Number of read and write commands whose size > segment size = 1278317
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 277.08
number of minutes until next internal SMART test = 108

Self Tests

Rather than wait for thresholds to be tripped, an administrator can request a self test. Alternatively a self test can be scheduled periodically (e.g. at 3 a.m. every night or perhaps weekly) with smartd. All SCSI disks and tape drives should support a default self test since it is mandatory. This can be invoked with the smartctl -t offline command. Despite the term "offline" this is actually a foreground test of less than 2 minutes. On completion the default self test reports any errors detected in its response. The default self test makes no entry into the self test log. Most SCSI devices perform a default self test when they are being powered up.

The other self tests that are optionally supported by the device are listed here with the smartctl invocation in brackets:

  • background short [smartctl -t short ]

  • background extended [smartctl -t long ]

  • foreground short [smartctl -C -t short ]

  • foreground extended [smartctl -C -t long ]

Short self tests should take less than two minutes to complete. The extended self tests have been known to take more than one hour for disks that are over 100 GBytes in size. Care should be taken with foreground tests on disks with mounted file systems as the OS may not take kindly to an hour delay on a simple READ command. []

Background self tests can be aborted with the smartctl -X command. The self test log will note that an abort was requested.

Self tests other than the default self test cause an entry to be placed in the self test results log page. The 20 most recent self tests are held. The self test results can be viewed with the smartctl -l selftest command. All tests output the accumulated power on hours when the test was performed and the success or otherwise (e.g. the self test was aborted by the user's request) of the test. Unsuccessful self tests output a self test segment number (vendor specific), the logical block address of the first failure (if appropriate) and a sense_key,asc,ascq triple (see appendix). Following the self test result table is the expected duration of an uninterrupted extended self test (when that figure is provided by the device).

Here is an example of a self test log:

# smartctl -l selftest /dev/sdd
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is


SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 100 - [- - -]
# 2 Background long Completed - 25 - [- - -]
# 3 Background long Completed - 24 - [- - -]
# 4 Background short Completed - 0 - [- - -]

Long (extended) Self Test duration: 603 seconds [10.1 minutes]

Error Logs

The smartctl -l error command displays the error counters maintained in the device's log pages. Here is an example of an error log:

# smartctl -l error /dev/sdd
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is


Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 5805 0 0 5805 5805 121.451 0
write: 0 0 0 0 0 471.291 0

Non-medium error count: 0

The displayed error logs (if available) are displayed on separate lines:

  • write error counters

  • read error counters

  • verify error counters (only displayed if non-zero)

  • non-medium error counter (only a single number displayed). This represents the number of recoverable events other than write, read or verify errors.

  • error events are held in the "Last n error events" log page. The number of error event records held (i.e. "n") is vendor specific (e.g. up to 23 records are held for Hitachi 10K300 model disks). The contents of each error event record is in ASCII and vendor specific. The parameter code associated with each error event record indicates the relative time at which the error event occurred. A higher parameter code indicates that the error event occurred later in time. If this log page is not supported by the device then "Error Events logging not supported" is output. If this log page is supported and there are error event records then each one is prefixed by "Error event :" where is the parameter code.

Each of the write, read and verify error counter logs has various parameters codes. They are itemized below with the smartctl column name followed, in brackets, with SCSI standard's description and parameter code). A description taken from Seagate's SCSI manual (publication 77738479, Rev J) is then given.

  • Errors Corrected by ECC, fast [Errors corrected without substantial delay: 00h]. An error correction was applied to get perfect data (a.k.a. ECC on-the-fly). "Without substantial delay" means the correction did not postpone reading of later sectors (e.g. a revolution was not lost). The counter is incremented once for each logical block that requires correction. Two different blocks corrected during the same command are counted as two events.

  • Errors Corrected by ECC: delayed [Errors corrected with possible delays: 01h]. An error code or algorithm (e.g. ECC, checksum) is applied in order to get perfect data with substantial delay. "With possible delay" means the correction took longer than a sector time so that reading/writing of subsequent sectors was delayed (e.g. a lost revolution). The counter is incremented once for each logical block that requires correction. A block with a double error that is correctable counts as one event and two different blocks corrected during the same command count as two events.

  • Error corrected by rereads/rewrites [Total (e.g. rewrites and rereads): 02h]. This parameter code specifies the counter counting the number of errors that are corrected by applying retries. This counts errors recovered, not the number of retries. If five retries were required to recover one block of data, the counter increments by one, not five. The counter is incremented once for each logical block that is recovered using retries. If an error is not recoverable while applying retries and is recovered by ECC, it isn't counted by this counter; it will be counted by the counter specified by parameter code 01h - Errors Corrected With Possible Delays.

  • Total errors corrected [Total errors corrected: 03h]. This counter counts the total of parameter code errors 00h, 01h and 02h (i.e. error corrected by ECC: fast and delayed plus errors corrected by rereads and rewrites). There is no "double counting" of data errors among these three counters. The sum of all correctable errors can be reached by adding parameter code 01h and 02h errors, not by using this total. [The author does not understand the previous sentence from the Seagate manual.]

  • Correction algorithm invocations [Total times correction algorithm processed: 04h]. This parameter code specifies the counter that counts the total number of retries, or "times the retry algorithm is invoked". If after five attempts a counter 02h type error is recovered, then five is added to this counter. If three retries are required to get stable ECC syndrome before a counter 01h type error is corrected, then those three retries are also counted here. The number of retries applied to unsuccessfully recover an error (counter 06h type error) are also counted by this counter.

  • Gigabytes processed {10^9} [Total bytes processed: 05h]. This parameter code specifies the counter that counts the total number of bytes either successfully or unsuccessfully read, written or verified (depending on the log page) from the drive. If a transfer terminates early because of an unrecoverable error, only the logical blocks up to and including the one with the uncorrected data are counted. [smartmontools divides this counter by 10^9 before displaying it with three digits to the right of the decimal point. This makes this 64 bit counter easier to read.]

  • Total uncorrected errors [Total uncorrected errors: 06h]. This parameter code specifies the counter that contains the total number of blocks for which an uncorrected data error has occurred.

The SCSI standard (SPC-3) cautions that the exact definitions of the error counters is not part of the standard (i.e. they are vendor specific). As noted the above list contains Seagate's explanation for its disk products (the last revision of that document was 1999). Seagate's disk product manuals imply that the disk firmware collects these counter values and periodically commit them to persistent storage (disk or non-volatile RAM). [] They also imply that their firmware is monitoring these error counters and if they exceed some threshold (e.g. in a certain time interval) then the firmware will report a thresholds exceeded.

The error counter logs for some disks (e.g. some Seagate models) can look worrying:

# smartctl -l error /dev/sdc
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is


Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 1111396 0 0 1111396 1113203 781.138 13
write: 0 0 0 0 92 822.450 4
verify: 341115 0 0 341115 341115 42.159 0

Non-medium error count: 1

The "fast" ECC corrected number is high. However the '-H' option reports the disk is in good health as does an extended (long) background self test. The uncorrected errors would be a problem had in not been for the fact that the author caused them on purpose (by writing a bad sector with the SCSI WRITE LONG command).

Background scan

Recent SCSI disks can perform what are termed as "background scans". These are reads of the whole media with recoverable errors acted on and unrecoverable errors noted. If a sector (block) is found with a recoverable error (i.e. the error correction codes (ECC) detect a problem but contain enough redundant information to fix the problem) it may be fixed with a re-write "in place". Alternatively the disk may decide to re-assign the recovered data to another physical sector which is assigned the same logical block address (and the original faulted sector is unmapped and placed on the grown defect list (GLIST)). Since unrecoverable errors potentially involve user data being lost, no automatic recovery action is undertaken by the disk. However logical block addresses that contain either recovered data or unrecoverable errors are noted in the Background Scan Results log page. The smartctl --log=background command decodes and outputs that log page.

Background scans may be performed periodically (e.g. every 24 hours) or every time the disk is powered up (or both). These parameters can be controlled via the Background Control mode page. The utility can be used to access and modify this mode page.

Here is an example of the output from the Background Scan Results log page. The first descriptor in that log page shows the status followed by up to 2048 entries for background scan "events". In this case a background scan is still in progress and 3 scans have been completed in the past. The "events" shown are all recoverable errors that the disk dealt with by rewriting the block.

# smartctl -l background /dev/sda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is

Background scan results log
Status: scan is active
Accumulated power on time, hours:minutes 618:01 [37081 minutes]
Number of background scans performed: 3, scan progress: 59.81%

# when lba(hex) [sk,asc,ascq] reassign_status
1 617:13 0000000001fbc5b2 [1,17,1] Recovered via rewrite in-place
2 617:13 00000000022756d2 [1,17,1] Recovered via rewrite in-place
3 617:14 000000000227727f [1,17,1] Recovered via rewrite in-place
4 617:18 00000000023568e5 [1,17,1] Recovered via rewrite in-place
5 617:22 00000000024fab5f [1,17,1] Recovered via rewrite in-place
6 617:23 00000000025aa29a [1,17,1] Recovered via rewrite in-place
7 617:27 000000000275d0bc [1,17,1] Recovered via rewrite in-place

In this case the reassign_status shows that no user intervention is required. The other "don't worry (too much)" reassign_status is "Logical block successfully reassigned". Any other reassign_status will require user intervention to correct. There is a LOWIR ("log only when intervention required") bit in the Background Control mode page that the user can set (e.g. with the utility) to filter out "noisy" entries like those shown above.

The user can manually re-assign logical blocks with a utility like sg_reassign found in the package. The background scan output contains a "[sk,asc,ascq]" tuple of numbers. The one shown above translates to "recovered error, recovered data with retries". Unrecoverable errors would most likely have 3 ("medium eror") or 4 ("hardware error") as the first number. A decoding of the latter two numbers can be found in the "Numeric Order Codes" annex of SPC-4 (see ) in the Additional Sense Codes section.

smartd daemon

smartd is a daemon for monitoring disks (both ATA and SCSI). It is recommended that tape drives and medium changers are monitored in a more manual fashion with the smartctl command as discussed in .

The configuration file for smartd is called /etc/smartd.conf and has a man page (as does the smartd command). The controlling daemon script is placed in the normal place for a distribution, typically /etc/rc.d/init.d/smartd.

smartd polls the devices it has recognized when it was started. By default it polls every 30 minutes. It reports any adverse finding and noteworthy occurrences (e.g. disk drive temperature changes) to a log file (/var/log/messages). smartd can be configured to take other actions, for example send email to a system administrator.

SCSI disks can be discovered by smartd via a scan of device nodes (for linux: /dev/sda through to /dev/sdz) by placing the word "DEVICESCAN" in /etc/smartd.conf file. Alternatively the "DEVICESCAN" word can be removed (or commented out) and SCSI devices named explicitly:

/dev/sda -a -d scsi
/dev/sdb -a -d scsi

The "-d scsi" argument overrides what smartd would guess as the device class (i.e. "ata", "scsi", "sat", "marvell", "3ware,N", "hpt,L/N[,M]" or "cciss,N"). In smartmontools version 5.37 the smartd daemon guesses SCSI device nodes on the basis of their name (i.e. without querying the device beforehand). However it does query the device after it has been placed in the SCSI group and if it notices that the vendor name is "ATA " and that it responds to SCSI ATA PASS THROUGH commands then a informational message is sent to the log suggesting that the user try adding '-d sat' (or perhaps a '-d scsi' should be changed to '-d sat'). After such a warning for node /dev/sdb the code snippet from the /etc/smartd.conf file might be changed to:

/dev/sda -a -d scsi
/dev/sdb -a -d sat

This may be automated in a later version of smartmontools (the smartctl command does automatic detection in version 5.37).

TapeAlert

TapeAlert (or "tape alerts") is closely related to the SMART infrastructure provided for SCSI disks. TapeAlert is specialized for tape and medium changer devices. An example of a TapeAlert is an indication that the tape drive heads need to be cleaned.

Pending TapeAlert errors can be read from the TapeAlert log page (using smartctl). This can be done even when SMART monitoring is disabled (e.g. after smartctl -s off ). In fact, the best way to use the TapeAlert mechanism is to poll the flags (with smartctl) at relevant times when using the tape, for example:

  • when starting a new job using the tape drive

  • after an unrecoverable error

  • at the end of using each tape (and before it is unloaded)

The TapeAlert information is divided into three severity classes: Critical, Warning, and Information. The critical messages require urgent user intervention. Both critical and warning errors may lead to loss of data. Some of the errors are related to the medium and others to the tape drive itself. This is why the TapeAlert information should be checked when the tape is in use and not polled periodically (i.e. the smartd daemon with its periodic polling is not particularly useful for TapeAlert mechanism).

Different sets of flags are defined for tape drives and media changers. Most of the flags are optional and the set of flags supported depends on the device. TapeAlert is being included into the SCSI-3 standards. Many SCSI-2 drives support TapeAlert but the implementation may not fully conform to the SCSI-3 draft definition used by smartmontools.

It is important that only one application (or OS driver) is monitoring tape alerts since reading the TapeAlert log page deactivates all flags after they are read. [] Currently the Linux SCSI tape drivers (st and osst) do not check the TapeAlert log page. In Linux, a medium changer device (i.e. the robot in a tape jukebox) is accessed via its SCSI generic (sg) device name.

Code and information on the TapeAlert mechanism have been provided by Kai Mäkisara <>.

Examples

Here is some output from the smartctl command. Mostly it is for the '--all' option.

  • StorageTek LT20 tape 'jukebox': the and the (robot). Note the TapeAlert warnings in the medium changer output.

  • HP DDS-4 drive.

  • Generic ATAPI CD-RW is an example of a device that does not support SMART.

  • IBM DDRS 39130 manufactured in 1998.

  • Fujitsu MAM3184MP 18 GigaByte when all is well. Here is the output from the smartctl -H command after the IEC Test bit has been set (with the smartctl -s on -r ioctl,3 command) on the same Fujitsu .

  • Fujitsu MAP3735NP 73 GigaByte

  • Quantum ATLAS IV 36 WLS, 36 GigaByte

  • Seagate Cheetah ST336754 36 GigaByte .

RAID, JBOD and Enclosures

It is unlikely that a hardware RAID controller will directly support smartmontools. A SCSI RAID controller is a virtual target device that essentially remaps the SCSI commands it receives to the physical disks on its internal buses. The physical disks in a "SCSI" RAID could be ATA or sATA disks, in this case a SCSI bus is used between the host computer and an external RAID controller since LVD SCSI buses (SPI-2,3 and 4) can run up to 25 metres (plus other protocol related issues).

Some SCSI RAIDs equipped internally with SCSI disks allow access to the physical disks via logical unit numbers (LUNs) greater than 0. The SCSI RAID controller itself takes a LUN equal to 0. In this case smartmontools could be applied to the LUNs greater than 0 that refer to physical disks.

Some SCSI RAIDs equipped internally ATA disks have a mechanism that allows ATA commands to be tunnelled to the ATA disks. The 3ware 6000 and 7000 series Escalade controllers are examples. In this case, special provision has been made in smartmontools (starting with release 5.1-16) to tunnel the ATA command required through to the physical disks. This is done by using the -d 3ware,N option/Directive. See the smartctl and smartd man pages for details.

The approach that smartmontools takes is to communicate directly with physical storage devices (e.g. a disk). Another approach is to collectively monitor and manage a group of disks and/or tape drives (be they a RAID, "Just a Bunch Of Disks" JBOD or a collection of disks and tape drives) in an enclosure. The SCSI Enclosure Services SES (reference: SES-2 at ) is designed for this task. Both SCSI device and recent SATA disk enclosures are using SES. Amongst other things SES can monitor the state of individual devices within the enclosure, the temperature, power supplies and fans. A user can set thresholds, define alarm types and remotely administer the enclosure.

A. Details
Standards

One of the first surprises working with SCSI devices and smartmontools is that the SCSI standards (found at ) do not use the term SMART. In its place the awkward term "Informational Exceptions" (IE) is used.

The original SCSI standard (over 20 years old now) and the SCSI-2 standard were monolithic documents. In SCSI-3 and beyond the SCSI standards have been sub-divided and three categories of interest are the:

  • architectural model [SAM-4]

  • command sets [SPC-4, SBC-3, SSC-3, SMC-2, etc]

  • transports [SPI-4, SBP-2, FCP-3, SAS, etc]

The architectural model while interesting says nothing specific about Informational Exceptions or related topics. With respect to the transports the term SCSI has often been synonymous with one of the SCSI Parallel Interface transports (e.g. SPI-4 which is often know as "Ultra320") however this is unhelpful. For the purpose of smartmontools the SCSI command sets are more interesting. The main reference is the SCSI Primary Commands (SPC-4) document, specifically these sections:

  • self test operations; SEND DIAGNOSTIC command (which is the mechanism for requesting self tests)

  • MODE SENSE and MODE SELECT commands (both 6 and 10 byte variants); Mode parameters [the Informational Exceptions Control (IEC) mode page and the Control mode page]

  • LOG SENSE and LOG SELECT commands; Log parameters [these log pages: Informational exceptions, read/write/verify error counters, non medium error count, temperature, start-stop cycle counter and the self test results]

The SCSI Block Commands (SBC-3) document covers random access storage devices such as disks (but excluding CD/DVD readers and writers which are covered by MMC-4) while the SCSI Streaming Commands (SSC-3) document covers tape systems. The SBC-3 standard does not contain any additional information (compared with SPC-4) about Informational Exceptions. The SSC-3 standard covers TapeAlert (section 4.2.15), some extra facilities in the IEC mode page (see the mode parameters section) and some additional log pages. Medium changers, typically the "robots" in jukebox tape systems, often support the TapeAlert mechanism and are described in the SMC-2 standard.

Informational Exceptions

So what are Informational Exceptions in the SCSI context? They are a set of vendor specific parameters that the device firmware monitors and if a "failure prediction threshold" is exceeded then an exception is reported. A user is also able to set thresholds on error counters and have an exception reported if a condition is met. Additionally most modern disks monitor their temperature and will issue a warning if a temperature threshold is exceeded.

The "failure prediction threshold" exception reporting and the temperature warning are separately controlled (in byte 2 of the Informational Exceptions Control (IEC) mode page). [] In smartmontools the smartctl -s on command turns on IE. There are various reasons why this may not (fully) work (e.g. IEC mode page not available or not changeable) so this command queries the device again after it has attempted the change and reports the state. The smartctl -s off command turns off IE reporting. []

IE reporting

Informational Exceptions are reported via the standard SCSI status reporting mechanism of an additional sense code (asc) and an additional sense code qualifier (ascq) pair. A selection of these pairs and the associated message (there is full list in the SPC-3 document) is listed here:

asc ascq message
-------------------------------------------------------
0xb 0x1 Warning - specified temperature exceeded
0x5d 0x0 Failure prediction threshold exceeded
0x5d 0x2 Media failure prediction threshold exceeded
0x5d 0x10 Hardware impending failure general hard drive failure
0x5d 0x11 Hardware impending failure drive error rate too high
0x5d 0x56 Spindle impending failure start unit times too high
0x5d 0xff Failure prediction threshold exceeded (false)

The last entry in the above table results from setting the TEST bit and is for exercising the reporting mechanism rather than the indication of an actual error. See this for more information.

One difficulty with IE is that the device firmware may detect these conditions independently of any command executing. Even if it detects an informational exception during a command it needs to be careful sending IE error notifications back with a command especially if that command succeeded (Linux will not handle this too well in the 2.4 kernel series). There is asynchronous event notification (AEN) in SCSI but it is not reliably supported across all transports. So smartmontools relies on a poll from the smartd daemon (the default is every 30 minutes) to detect informational exceptions.

The additional sense code and its qualifier are part of what is termed as the sense buffer which is the response to a REQUEST SENSE command. The sense key is also found in the sense buffer. Synchronous SCSI commands that fail return a single byte status code of CHECK CONDITION. An OS kernel would see this error/warning status and then check the sense buffer (by doing a REQUEST SENSE or by other means) and decide how to continue. From smartmontools's point of view, its smartd daemon would like to process Informational Exceptions without interference from the OS. This is done by setting up the IEC mode page's MRIE field set to 6. This instructs the SCSI device to hold a pending exception until an unsolicited REQUEST SENSE is sent. If an exception is pending then the sense key will be "NO SENSE" and the asc, ascq pair will be set accordingly. In the case of no pending exception the asc,ascq pair will both be zero. The pending exception is also visible in the IE log page, if that is supported. So smartd can check the device during its normal polling cycle.

Pending informational exceptions can also be checked by running smartctl -H . A message of "SMART Health Status: OK" indicates that there is no pending IE. []

smartctl debug

Debug information for smartctl is output when the -r ioctl or the -r scsiioctl option is used. More debug is output when the -r ioctl, form is used (where "n" is a number greater or equal to 1). Both -r ioctl and r scsiioctl,1 select the same amount of SCSI debug information. The debug levels currently defined are:

  • 1 - output SCSI commands sent to the device and the status received from the device

  • 2 - additionally, output the first 64 bytes of data sent to or received from the device

  • 3 - additionally, set the IEC mode page TEST bit if accompanying the '-s on' option

See this for more information about the use of the IEC mode page TEST bit.

One shortcoming of the Informational Exception data provided by SCSI devices (at least as defined in the current standard) is that no LOG SENSE page tells the user how many hours the device has been in use for. The device needs to track its "age" for applying timestamps to self test results (seen in the "Lifetime (hours)" column of the smartctl -l selftest command) if they are supported. So one way to circumvent this shortcoming is to do dummy self tests. Hence do a smartctl -t short command and then wait 2 minutes to see the result in the self test log in which the most recent self test row (i.e. the first) will have the current lifetime of the device.

Links

Here are some links to related projects and packages:

  • the primary reference site for SCSI architecture, command sets and transports is . The main documents of interest to smartmontools are the "Primary Commands" (SPC-4), the "Block Commands" (SBC-3) for disks and the "Streaming Commands" (SSC-3) for tape drives. This page contains a diagram showing the relationships of various SCSI standards. []

  • SCSI raid monitoring tools plus a firmware update utility and other low level tools .

  • The sdparm utility allows mode page settings to be viewed and changed. It can decode Vital Product Data (VPD) pages. It implements a small number of commands to start and stop media, and to eject and load removable media. See this page . sdparm is available on Linux with ports to FreeBSD, Tru64 and Windows.

  • A package of SCSI low level tools for Linux called sg3_utils can be found on this page (the most recent version is sg3_utils-1.22). Allows command level access to SCSI devices and is available on Linux with ports to FreeBSD, Tru64 and Windows.

  • There is a HOWTO on the Linux SCSI subsystem in the 2.4 series here: .

CVS $Id: smartmontools_scsi.xml,v 1.16 2006/11/21 20:23:07 dpgilbert Exp $


[] The 3ware RAID solution tunnels the ATA commands needed for smartmontools (together with a disk number) through a vendor specific SCSI command.

[] There are SATA devices called port multipliers that allow up to 15 SATA drives to be connected to one host. SAS expanders seem to be a better approach to the problem of connecting a large number of disks to one or more hosts.

[] Even sending trial ATA and SCSI commands to see which one a device responds to could be tricked. ATAPI cd/dvd drives respond to both ATA commands (a few, for example IDENTIFY PACKET DEVICE) and SCSI commands (found in MMC).

[] For example: Seagate's "Cheetah 15K.3 Product Manual, Rev F" contains sections on SMART, thermal monitor, and drive self test (section 5.2.7 to 5.2.9). It also lists the supported mode pages with their default and changeable values.

[] Linux has an additional problem with the foreground extended self tests: it will attempt to time out the command after 10 seconds. This will appear in the self test log page as an aborted self test. This problem is fixed in lk 2.4.22 and the lk 2.6 series (by extending the timeout to 2 hours). To be on the safe side use the background extended test instead. Also some disks silently ignore foreground self tests (e.g. the Seagate Cheetah series).

[] This is why some models spring to life after minutes of inactivity and perform some operation even though there are no external commands pending.

[] In a multi initiator environment (e.g. several computers sharing the same tape jukebox) there should only be one application monitoring tape alerts per initiator.

[] Henceforth the term Informational Exceptions (or IE) will include both Informational Exceptions and the temperature (or "enclosure degraded") warnings.

[] IE have a (minor) performance impact on a disk. There are various other settings in the IEC mode page (e.g. PERF, EBF and LOGERR) that address this. The standard gives a lot of latitude to the vendor in implementing these additional flags. This finer level of control may be added to smartmontools if the need arises.

[] One might worry whether the smartd daemon is properly set up or if the device really will issue IE when the need arises. The mechanism can be tested by setting the TEST bit in the IEC mode page. That is done by this command: smartctl -r ioctl,3 -s on [ignore the extra debugging output that "-r ioctl,3" causes]. A special asc/ascq pair is reserved for testing (0x5d,0xff) and the standard associates with it this awkward message: "Failure prediction threshold exceeded (false)". A call to smartctl -H or waiting until the next smartd poll should produce that message if the mechanism is working. The IEC mode page TEST bit can be turned off (i.e. back to normal IE) with smartctl -s on . The output after the TEST bit has been activated is shown in the Examples section for the Fujitsu MAM3184 disk.

[] The documents found on the t10 site are actually draft standards. Once they are ratified they become available from ANSI for a fee. The t10 site maintains the last draft prior to ratification and the most recent draft of yet to be ratified standards.

阅读(3056) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~