This article contains a procedure for troubleshooting the "failed," or "failed was" status, as reported by vxdisk.
分类: LINUX
2013-08-23 10:36:38
This article contains a procedure for troubleshooting the "failed," or "failed was" status, as reported by vxdisk.
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
disk_0 auto:cdsdisk - (vxfendg) online
disk_1 auto:cdsdisk - (vxfendg) online
disk_2 auto:cdsdisk - (vxfendg) online
disk_3 auto:cdsdisk datadg01 datadg online
disk_4 auto - - error
disk_5 auto:cdsdisk datadg03 datadg online
disk_6 auto:cdsdisk datadg04 datadg online
disk_7 auto:cdsdisk - (sambadg) online
disk_8 auto:cdsdisk - - online
disk_9 auto:cdsdisk - - online
sda auto:none - - online invalid
- - datadg02 datadg failed was:disk_4
This article contains a procedure for troubleshooting the "failed," or "failed was" status, as reported by vxdisk.
A "failed" status is a record of a disk that is no longer accessible.
This is often caused by sustained I/O errors to the disk that prevents
it from being read by the operating system (OS). It may also be the
result of corruption within the Veritas private region.
The private region is the portion of the disk where Veritas stores
records about the disk group, such as disks, volumes, subdisks and
plexes. This can be contrasted with the public region, which contains
the actual volumes, including user data.
The status of "failed" should not be confused with a status of
"failing." This article primarily discusses the "failed" status, as
reported by vxdisk. For information on troubleshooting the "failing" status, see .
Before making any further changes, use vxconfigbackup to create an emergency backup of the private region for the remaining disks in the affected disk group.
Vxconfigbackup does not back up the actual data that
is contained within the volumes. Instead, it backs up the Veritas
private region configuration database that resides on the disks, along
with some information about the disks themselves. The configuration
database stores information about which disks are contained by the disk
group, volume structures, plexes and subdisks.
If vxconfigbackup is not available, vxprivutil can be used to dump a copy of the configuration database.
More details about vxconfigbackup and vxprivutil, including syntax and examples, can be found in this article:
"Using vxconfigbackup and vxprivutil to back up the disk group configuration of the Veritas private region"
If vxdisk shows that the disk type includes the words
"LVM" or "ZFS," then the disk may have been overwritten by another
logical volume manager (LVM) solution. It is also possible that there
is a problem with the SAN zoning which may have caused disks to be
presented to the wrong systems. Before making any further changes,
ensure that the disk is not supposed to be zoned to another system.
To bring a disk back into its original, Veritas disk group, the disk
must first be removed from the control of the other LVM solution and
then initialized for Veritas, using vxdisksetup. Refer
to the documentation for the appropriate vendor for information about
removing a disk from the control of their LVM solution.
Vxreattach is used to restore the original disk
media name and reattach the disk back to the disk group. It can
normally only be used if the status of the disk is "online" (see Figure
1).
Run vxreattach, using the "-c" argument, to determine if a disk can be reattached to the disk group.
Figure 1 - Using vxreattach, with the "-c" argument, to check if a reattach is possible
Syntax: vxreattach -c Example, with typical output: # vxreattach -c disk_4 datadg datadg02 In this case, "datadg" is the name of the disk group while "datadg02" is the disk media name, as shown by vxdisk. # vxdisk -o alldgs list DEVICE TYPE DISK GROUP STATUS disk_0 auto:cdsdisk - (vxfendg) online disk_1 auto:cdsdisk - (vxfendg) online disk_2 auto:cdsdisk - (vxfendg) online disk_3 auto:cdsdisk datadg01 datadg online disk_4 auto:cdsdisk - (datadg) online disk_5 auto:cdsdisk datadg03 datadg online disk_6 auto:cdsdisk datadg04 datadg online disk_7 auto:cdsdisk - (sambadg) online disk_8 auto:cdsdisk - - online disk_9 auto:cdsdisk - - online sda auto:none - - online invalid - - datadg02 datadg failed was:disk_4 |
If vxreattach -c returns a disk group and disk media
name, without returning any errors, proceed with reattaching the disk
(Figure 2). If a reattach is not possible, a V-5-2-238 error will
appear.
Figure 2 - Using vxreattach to reattach a disk to the disk group
Syntax: vxreattach -br Example, with typical output: # vxreattach -br disk_4 Notice that vxdisk now shows a disk media name, "datadg02," for disk_4. # vxdisk -o alldgs list DEVICE TYPE DISK GROUP STATUS disk_0 auto:cdsdisk - (vxfendg) online disk_1 auto:cdsdisk - (vxfendg) online disk_2 auto:cdsdisk - (vxfendg) online disk_3 auto:cdsdisk datadg01 datadg online disk_4 auto:cdsdisk datadg02 datadg online disk_5 auto:cdsdisk datadg03 datadg online disk_6 auto:cdsdisk datadg04 datadg online disk_7 auto:cdsdisk - (sambadg) online disk_8 auto:cdsdisk - - online disk_9 auto:cdsdisk - - online sda auto:none - - online invalid |
Use native OS commands to confirm that the OS can read the disk, including the disk label.
Veritas depends on the OS device drivers to communicate with disks.
If the OS is unable to read a disk, Veritas will also fail to read it.
If a disk does not have a label, or the label has been corrupted,
Veritas will not recognize the disk. Completing these steps will assist
with identifying the source of a disk outage.
Use vxdmpadm to determine the status of the paths to the disks (Figure 3).
Veritas will disable a path if serious or sustained I/O errors occur.
When all paths to a disk are disabled, the server will be unable to
read or write to the volume. If a path has been disabled, review the
syslog for events that are reported by "vxdmp," or "scsi" for I/O
errors.
Although a path can be re-enabled using "vxdmpadm enable," vxdmp should automatically evaluate the status of a path in five minute
intervals using a scsi inquiry. If the query is successful, the path is
automatically re-enabled. If a path remains disabled beyond this
interval, it is possible that I/O errors are still being detected,
warranting further investigation. Paths are not automatically re-enabled
If the diskgroup has been disabled, or if vxesd is stopped. The
behavior of vxdmp in response to disabled paths can be modified via the DMP tunables, which can be viewed using "vxmpadm gettune."
Figure 3 - Example of a disabled path, as reported by vxdmpadm
Syntax: vxdmpadm getsubpaths For example: # vxdmpadm getsubpaths NAME STATE[A] PATH-TYPE[M] DMPNODENAME ENCLR-NAME CTLR ====================================================================== sdm ENABLED(A) - disk_0 disk c8 sdp ENABLED(A) - disk_0 disk c3 sdb ENABLED(A) - disk_1 disk c8 sdc ENABLED(A) - disk_1 disk c3 sdq ENABLED(A) - disk_2 disk c8 sdt ENABLED(A) - disk_2 disk c3 sdd ENABLED(A) - disk_3 disk c8 sdf ENABLED(A) - disk_3 disk c3 sdi DISABLED - disk_4 disk c8 sdl DISABLED - disk_4 disk c3 sde ENABLED(A) - disk_5 disk c8 sdh ENABLED(A) - disk_5 disk c3 sdk ENABLED(A) - disk_6 disk c8 sdn ENABLED(A) - disk_6 disk c3 sdr ENABLED(A) - disk_7 disk c8 sdu ENABLED(A) - disk_7 disk c3 sdg ENABLED(A) - disk_8 disk c8 sdj ENABLED(A) - disk_8 disk c3 sdo ENABLED(A) - disk_9 disk c8 sds ENABLED(A) - disk_9 disk c3 sda ENABLED(A) - sda disk c2 |
If a vxreattach is not possible, use vxconfigrestore to recover the disk group.
Vxconfigrestore does not restore the actual data that
is contained within the volumes. It only restores the Veritas
configuration database that is located within the private region of the
disks. The configuration database stores information about which disks
are contained by the disk group, volume structures, plexes and
subdisks.
If using vxconfigrestore is not possible, another
method for recovering the disks is to compare the UDID or Disk ID
attributes of the disks with the records that are contained with the
private region configuration database.
Once the original disk has been added back to its disk group, additional steps may be needed to recover the volume. Use vxprint to determine the current status (Figure 4).
WARNING: Do not simply
force-start a mirrored volume. This may cause a plex that contains old,
or corrupt, blocks to overwrite a plex that contains up-to-date data. A
procedure for manually determining the most up-to-date mirror plex can
be found in this article:
"Manually determining which mirror plex contains the most recent data and then resynchronizing"
Figure 4 - Using vxprint to determine the status of a volume
Syntax: vxprint -g Example, with typical output: In this case, vxprint shows that the volume "vol1" is disabled. The plex status is "IOFAIL," which indicates that a sustained I/O interuption to the volume has occured. After the associate disk is added back to the disk group, the volume will need to be restarted manually using vxvol. #vxprint -g datadg -ht dg datadg default default 1000 1336573086.38.Server101 dm datadg01 disk_3 auto 65536 2027264 - dm datadg02 disk_4 auto 65536 2027264 - dm datadg03 disk_5 auto 65536 2027264 - dm datadg04 disk_6 auto 65536 2027264 - v locks - ENABLED ACTIVE 102400 SELECT - fsgen pl locks-01 locks ENABLED ACTIVE 102400 CONCAT - RW sd datadg04-01 locks-01 datadg04 0 102400 0 disk_6 ENA v vol1 - DISABLED ACTIVE 6010880 SELECT - fsgen pl vol1-01 vol1 DISABLED IOFAIL 6010880 CONCAT - RW sd datadg01-01 vol1-01 datadg01 0 2027264 0 disk_3 ENA sd datadg02-01 vol1-01 datadg02 0 2027264 2027264 disk_4 ENA sd datadg03-01 vol1-01 datadg03 0 1956352 4054528 disk_5 ENA |
Figure 5 - Using vxvol to start a volume and using vxprint to review any changes in the status of the volume
Syntax: vxvol -f Example, with typical output: # vxvol -g datadg -fa startall Vxprint now shows that the volume has been started. #vxprint -g datadg -ht dg datadg default default 1000 1336573086.38.Server101 dm datadg01 disk_3 auto 65536 2027264 - dm datadg02 disk_4 auto 65536 2027264 - dm datadg03 disk_5 auto 65536 2027264 - dm datadg04 disk_6 auto 65536 2027264 - v locks - ENABLED ACTIVE 102400 SELECT - fsgen pl locks-01 locks ENABLED ACTIVE 102400 CONCAT - RW sd datadg04-01 locks-01 datadg04 0 102400 0 disk_6 ENA v vol1 - ENABLED ACTIVE 6010880 SELECT - fsgen pl vol1-01 vol1 ENABLED ACTIVE 6010880 CONCAT - RW sd datadg01-01 vol1-01 datadg01 0 2027264 0 disk_3 ENA sd datadg02-01 vol1-01 datadg02 0 2027264 2027264 disk_4 ENA sd datadg03-01 vol1-01 datadg03 0 1956352 4054528 disk_5 ENA |
Figure 6 - Using vxrecover to finish the recovery, or start the resynchronization, of a volume
Syntax: vxrecover -sb Example, with typical output: # vxrecover -sb vol1 Vxprint now shows that "vol1" is "ACTIVE." # vxprint -g datadg -ht dg datadg default default 10000 1336408747.34.Server101 dm datadg01 disk_3 auto 65536 2027264 - dm datadg02 disk_4 auto 65536 2027264 - dm datadg03 disk_5 auto 65536 2027264 - dm datadg04 disk_6 auto 65536 2027264 - v locks - ENABLED ACTIVE 102400 SELECT - fsgen pl locks-01 locks ENABLED ACTIVE 102400 CONCAT - RW sd datadg04-01 locks-01 datadg04 0 102400 0 disk_6 ENA v vol1 - ENABLED ACTIVE 102400 SELECT - fsgen pl vol1-01 vol1 ENABLED ACTIVE 102400 CONCAT - RW sd datadg01-01 vol1-01 datadg01 0 102400 0 disk_3 ENA pl vol1-02 vol1 ENABLED ACTIVE 102400 CONCAT - RW sd datadg04-02 vol1-02 datadg04 102400 102400 0 disk_6 ENA pl vol1-03 vol1 DISABLED ACTIVE 102400 CONCAT - RW sd datadg03-01 vol1-03 datadg03 0 102400 0 disk_5 ENA |
keywords: failed, failing, failed disk, failed disks, failing disk, failed disks,
|
|
Article URL