分类: LINUX
2009-01-06 18:43:37
The connection from the server through the HBA to the storage controller is referred as a path. When multiple paths exists to a storage device(LUN) on a storage subsystem, it is referred as multipath connectivity. It is a enterprise level storage capability. Main purpose of multipath connectivity is to provide redundant access to the storage devices, i.e to have access to the storage device when one or more of the components in a path fail. Another advantage of multipathing is the increased throughput by way of load balancing.
Note: Multipathing protects against the failure of path(s) and not the failure of a specific storage device.
Common example of multipath is a SAN connected storage device. Usually one or more fibre channel HBAs from the host will be connected to the fabric switch and the storage controllers will be connected to the same switch.
A simple example of multipath could be: 2 HBAs connected to a switch to which the storage controllers are connected. In this case the storage controller can be accessed from either of the HBAs and hence we have multipath connectivity.
In the following diagram each host has 2 HBAs and each storage has 2 controllers. With the given configuration setup each host will have 4 paths to each of the LUNs in the storage.
In Linux, a SCSI device is configured for a LUN seen on each path. i.e, if a LUN has 4 paths, then one will see four SCSI devices getting configured for the same device. Doing I/O to a LUN in a such an environment is unmanageable
Device mapper is a block subsystem that provides layering mechanism for block devices. One can write a device mapper to provide a specific functionality on top of a block device.
Currently the following functional layers are available:
Multiple device mapper modules can be stacked to get the combined functionality.
Click for more information on device mapper.
Object of this document is to provide details on device mapper multipathing (DM-MP). DM-MP resolves all the that arise in accessing a multipathed device in Linux. It also provides a consistent user interface for storage devices provided by multiple vendors. There is only one block device (/dev/mapper/XXX) for a LUN. This is the device created by device mapper.
Paths are grouped into s, and one of the will be used for I/O, and is called . A selects a path in the to be used for an I/O based on some load balancing algorithm (for example ).
When a I/O fails in a path, that path gets disabled and the I/O is retried in a different path in the same . If all paths in a fails, a different which is will be selected to send I/O.
DM-MP consists of 4 components:
DM MP kernel module - Kernel module that is responsible for making the multipathing decisions in normal and failure situations.
multipath command - User space tool that allows the user with initial configuration, listing and deletion of multipathed devices.
multipathd daemon - User space daemon that constantly monitors the paths. It marks a path as failed when it finds the path faulty and if all the paths in a priority group are faulty then it switches to the next .
It keeps checking the failed path, once the failed path comes alive, based on the , it can activate the path. It provides an CLI to monitor/manage individual paths. It automatically creates device mapper entries when new devices comes into existence.
kpartx - User space command that creates device mapper entries for all the partitions in a multipathed disk/LUN. When the multipath command is invoked, this command automatically gets invoked. For DOS based partitions this command need to be run manually.
Standard output of multipath command
# multipath -ll mydev1 (3600a0b800011a1ee0000040646828cc5) dm-1 IBM,1815 FAStT [size=512M][features=1 queue_if_no_path][hwhandler=1 rdac] \_ round-robin 0 [prio=6][active] \_ 29:0:0:1 sdf 8:80 [active][ready] \_ 28:0:1:1 sdl 8:176 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 28:0:0:1 sdb 8:16 [active][ghost] \_ 29:0:1:1 sdq 65:0 [active][ghost]
Annotated output of multipath command
mydev1 (3600a0b800011a1ee0000040646828cc5) dm-1 IBM,1815 FAStT ------ --------------------------------- ---- --- --------------- | | | | |-------> Product | | | |------------------> Vendor | | |-----------------------> sysfs name | |-------------------------------------------------> WWID of the device |------------------------------------------------------ ----------> User defined Alias name [size=512M][features=1 queue_if_no_path][hwhandler=1 rdac] --------- --------------------------- ---------------- | | |--------------------> Hardware Handler, if any | |---------------------------------------------> Features supported |---------------------------------------------------------------> Size of the DM device Path Group 1: \_ round-robin 0 [prio=6][active] -- ------------- ------ ------ | | | |----------------------------------------> Path group state | | |-----------------------------------------------> Path group priority | |--------------------------------------------------------------> Path selector and repeat count |-------------------------------------------------------------------> Path group level First path on Path Group 1: \_ 29:0:0:1 sdf 8:80 [active][ready] -------- --- ---- ------ ----- | | | | |---------------------------------> Physical Path state | | | |----------------------------------------> DM Path state | | |-------------------------------------------------> Major, minor numbers | |-------------------------------------------------------> Linux device name |--------------------------------------------------------------> SCSI information: host, channel, scsi_id and lun Second path on Path Group 1: \_ 28:0:1:1 sdl 8:176 [active][ready] Path Group 2: \_ round-robin 0 [prio=0][enabled] \_ 28:0:0:1 sdb 8:16 [active][ghost] \_ 29:0:1:1 sdq 65:0 [active][ghost]
Paths are grouped into a path groups. At any point of time only path group will be active. decides which path in the path group gets to send the next I/O. I/O will be sent only to the active path.
Each path has a specific priority. A program provides the priority for a given path. The user space commands use this priority value to choose an active path. In the group_by_prio , path priority is used to group the paths together and change their relative weight with the round robin .
multibus: One path group is formed with all paths to a LUN. Suitable for devices that are in mode.
group_by_serial: One path group per storage controller(serial). All paths that connect to the LUN through a controller are assigned to a path group. Suitable for devices that are in mode.
Setting multibus as path grouping policy for a storage device in mode will reduce the I/O performance.
readsector0: sends a read command to sector 0 at regular time interval. Produce lot of error messages in mode. Hence, suitable only for mode.
This refers to the physical state of a path. A can be in one of the following states:
ready: Path is up and can handle I/O requests.
faulty: Path is down and cannot handle I/O requests.
ghost: Path is a passive path. This state is shown in the passive path in mode.
active: Last I/O sent to this path successfully completed. Analogous to .
failed: Last I/O to this path failed. Analogous to .
enabled: If none of the paths in the active path group is in the ready state, I/O will be sent these path groups. There can be one or more path groups in this state.
When all the paths in a path group are in , one of the enabled path group (path with highest priority) with any paths in will be made . If there is no paths in ready state in any of the enabled path groups, then one of the disabled path group (path with highest priority) will be made active. Making a new path group active is also referred as switching of path group. Original active path group's state will be changed to enabled.
A failed path can become active at any point of time. multipathd keeps checking the path. Once it finds a path is active, it will change the state of the path to . If this action makes one of the path group's priority to be higher than the current active path group, multipathd may choose to failback to the highest priority path group.
Under situations multipathd can do one of the following three things:
This policy selection can be set by the user through .
A user friendly and/or user defined name for a DM device. By default, WWID is used for the DM device. This is the name that is listed in /dev/disk/by-name directory. When the configuration option is set, the alias of a DM device will have the form of mpath
DM-Multipath allows many of the feature to be user configurable using the configuration file /etc/multipath.conf. multipath command and multipathd uses the configuration information from this file. This file is consulted only during the configuration of multipath devices. In other words, if the user makes any changes to this file, then the multipath command need to be rerun to configure the multipath devices (i.e the user has to do multipath -F followed by multipath).
Support for many of the devices (as listed below) is inbuilt in the user space component of DM-Multipath. If the support for a specific storage device is not inbuilt or the user wants to override some of the values only then the user need to modify this file.
This file has 5 sections:
System level defaults ("defaults"): Where the user can specify system level default override.
Black listed devices ("blacklist"): User can specify the list of devices they do not want to be under the control of DM-Multipath. These devices will be excluded.
Black list exceptions ("blacklist_exceptions"): Specific devices to be treated as multipath candidates even if they exist in the blacklist.
Storage controller specific settings ("devices"): User specified configuration settings will be applied to devices with specified "Vendor" and "Product" information.
Device specific settings ("multipaths"): User can fine tune configuration settings for individual LUNs.
User can specify the values for the attributes in this file using regular expression syntax.
For detailed explanation of the different attributes and allowed values for the attributes please refer to multipath.conf.annotated file.
In , this file is located in the directory /usr/share/doc/device-mapper-multipath-X.Y.Z/. In SuSE, this file is located in the directory /usr/share/doc/packages/multipath-tools/
Attribute values are set at multiple levels (internally in multipath tools and through multipath.conf file). Following is the order in which the attribute values will be overwritten.
Note that this will completely overwrite configuration information defined in (2) above. So, if even if you want to change/add only one attribute one have to provide the whole list for a device.
Man page of multipath/multipathd provides good details on the usage of the tools.
multipathd has a interactive mode option which can be used for querying and managing the paths and also to check the configuration details that will be used.
When multipathd is running, one has to invoke multipathd with the command line multipathd -k. multipathd will enter into a command line mode where user can invoke different commands. Checkout the man page for different commands.
3. Supported Storage Devices
This is the list of devices that have configuration information built-in in the multipath tools. Not being in this list does not mean that the specific device is not supported, it just means that there is no built-in configuration in the multipath tools.
Some of the devices do need a hardware handler which need to compiled in the kernel. The device being in this list does mean that the hardware handler is present in the kernel. It is possible that the hardware handler is present in the kernel but the device is not added in the list of supported built-in devices.
Folowing are the list of storage devices that have configuration information buil-in in the multipath tools.
Vendor |
Product |
Common Name |
Mainline 0.4.7 |
Mainline 0.4.8 |
RHEL5 |
RHEL5 U1 |
SLES10 |
SLES10 SP1 |
3PARdata |
VV |
- |
YES |
YES |
YES |
YES |
YES |
YES |
APPLE |
Xserve RAID |
- |
- |
YES |
YES |
YES |
YES |
YES |
{COMPAQ, HP} |
{MSA,HSV}1* |
- |
YES |
- |
- |
- |
YES |
- |
(COMPAQ,HP) |
(MSA|HSV)1.0.* |
- |
- |
YES |
- |
- |
- |
YES |
(COMPAQ,HP) |
(MSA|HSV)1.1.* |
- |
- |
YES |
- |
- |
- |
YES |
(COMPAQ,HP) |
MSA1.* |
- |
- |
- |
YES |
YES |
- |
- |
(COMPAQ,HP) |
HSV(1|2).* |
- |
- |
- |
YES |
YES |
- |
- |
DDN |
SAN |
- |
YES |
YES |
YES |
YES |
YES |
YES |
DEC |
HSG80 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
DGC |
* |
- |
YES |
YES |
YES |
YES |
YES |
YES |
EMC |
SYMMETRIX |
- |
YES |
YES |
YES |
YES |
YES |
YES |
FSC |
|
- |
YES |
YES |
YES |
YES |
YES |
YES |
GNBD |
GNBD |
- |
- |
- |
YES |
YES |
- |
- |
HITACHI |
{A6189A,OPEN-} |
- |
YES |
- |
- |
- |
- |
- |
(HITACHI|HP) |
OPEN-.* |
- |
- |
YES |
YES |
YES |
YES |
YES |
HITACHI |
DF.* |
- |
- |
YES |
YES |
YES |
YES |
YES |
HP |
A6189A |
- |
- |
YES |
YES |
YES |
YES |
YES |
HP |
MSA VOLUME |
- |
- |
YES |
- |
- |
- |
YES |
HP |
HSV2* |
- |
YES |
YES |
- |
- |
YES |
YES |
HP |
LOGICAL VOLUME.* |
- |
- |
YES |
- |
- |
- |
- |
HP |
DF[456]00 |
- |
YES |
- |
- |
- |
- |
- |
Vendor |
Product |
Common Name |
Mainline 0.4.7 |
Mainline 0.4.8 |
RHEL5 |
RHEL5 U1 |
SLES10 |
SLES10 SP1 |
IBM |
4000R |
- |
YES |
YES |
YES |
YES |
YES |
YES |
IBM |
1742 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
IBM |
3526 |
- |
- |
YES |
YES |
YES |
YES |
YES |
IBM |
3542 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
IBM |
2105F20 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
IBM |
2105800 |
- |
- |
YES |
YES |
YES |
YES |
YES |
IBM |
{1750500,2145} |
- |
YES |
YES |
YES |
YES |
YES |
YES |
IBM |
2107900 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
IBM |
S/390 DASD ECKD |
- |
YES |
YES |
YES |
YES |
YES |
YES |
IBM |
Nseries.* |
- |
- |
YES |
YES |
YES |
YES |
YES |
NETAPP |
LUN.* |
- |
YES |
YES |
YES |
YES |
YES |
YES |
Pillar |
Axiom 500 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
Pillar |
Axiom.* |
- |
- |
YES |
- |
- |
- |
- |
SGI |
TP9[13]00 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
SGI |
TP9[45]00 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
SGI |
IS.* |
- |
- |
YES |
- |
- |
- |
YES |
STK |
OPENstorage D280 |
- |
YES |
YES |
YES |
YES |
YES |
YES |
SUN |
{ 3510,T4} |
- |
YES |
YES |
YES |
YES |
YES |
YES |
Some storage device need special handling for path failover/failback. Which means that they need a hardware handler present in the kernel. Following are the list of the storage devices that has hardware handler in the kernel.
Generic controller Name |
Storage device Name |
Mainline 2.6.22 |
Mainline 2.6.23 |
RHEL5 |
RHEL5 U1 |
SLES10 |
SLES10 SP1 |
LSI Engenio |
IBM DS4000 Series, IBM DS3000 Series |
- |
YES |
- |
YES |
- |
YES |
EMC CLARiiON |
AX/CX-series |
YES |
YES |
YES |
YES |
YES |
YES |
- |
HP Storage Works and Fibrecat |
- |
- |
- |
- |
- |
YES |
There are advantages in making your boot/root partitions to be in SAN, like avoiding single point of failure, the disk content being accessible even if the server is down etc., This section describes the steps to be taken in the two major distributions to successfully install and boot of a SAN/multipathed device.
Note: This is tested on SLES10 SP1. If you have any other version, your mileage may vary.
Install the OS in a device that has multiple paths.
Make sure the root device's "Mount by" option is set to "Device by-id" (this option is available under "expert partitioner" as "fstab options").
If you are installing on LVM, choose "Mount by" to be "by label".
Complete the installation. Let the system boot up in multiuser mode.
Make sure the root device, swap device are all referenced by their by-id device node entries instead of /dev/sd* type names. If they are not, fix them first.
If using LVM, make sure the device are referenced by LABEL.
Once booted, update /etc/multipath.conf
If you have to make changes to /etc/multipath.conf, make the changes.
Note: the option "user_friendly_names" is not supported by initrd. So, if you have user_friendly_names in your /etc/multipath.conf file, comment it for now, you can uncomment it .
chkconfig boot.multipath on chkconfig multipathd on
Edit the file /etc/sysconfig/kernel and add "dm-multipath" to INITRD_MODULES".
Note: If your storage devices needs a hardware handler, add the corresponding module to INITRD_MODULES, in addition to "dm-multipath". For example add "dm-rdac" and "dm-multipath" to support IBM's DS4K storage devices
Run mkinitrd, if required run lilo.
Note: You can uncomment the user friendly name if you have commented it above.
The system will come up with the root disk on a multipathed device.
Note: You can switch off multipathing to the root device by adding multipath=off to the kernel command line.
4.2. Installation instructions for RHEL5
Note: This is tested on RHEL5 U1. If you have any other version, your mileage may vary.
Start the installation with the kernel command line "linux mpath"
Reboot.
If your boot device does not need multipath.conf and does not have a special hardware handler, then you are done.
If you have either of these, follow the steps below.
Run mkinitrd, if you need a hardware handler, add it to initrd with --with option.
# mkinitrd /boot/initrd.final.img --with=dm-rdac
The system will come up with the root disk on a multipathed device.
Note: You can switch off multipathing to the root device by adding multipath=off to the kernel command line.
Note: By default, disables dm-multipath by blacklisting all devices in /etc/multipath.conf. It just excludes your root device. If you do not see your other multipath devices through "multipath -ll", then check and fix the blacklist in /etc/multipath.conf
4.3. Other Distributions
Syntax is:
multipaths { multipath { wwid 3600a0b800011a2be00001dfa46cf0620 alias mydev1 } }
Using binding file in clustered environment: Bindings file holds the bindings between the device mapper names and the uid of the underlying device. By default the file is /var/lib/multipath/bindings, this can be changed by the multipath command line option -b. In a clustered environment, this file can be created in one node and can be transferred to another to get the same names.
Note that the same effect can also be acheived by using alias and having the same multipath.conf file in all the nodes of the cluster.
When using LVM on dm-multipath devices, it is better to turn lvm scanning off on the underlying SCSI devices. This can be done by changing the filter parameter in /etc/lvm/lvm.conf to be filter = [ "a/dev/mapper/.*/", "r/dev/sd.*/" ].
If your root device is also a multipathed lvm device, then make the above change before you create a new initrd image.
Mainline Documentation for multipath tools