Chinaunix首页 | 论坛 | 博客
  • 博客访问: 387855
  • 博文数量: 62
  • 博客积分: 5015
  • 博客等级: 大校
  • 技术积分: 915
  • 用 户 组: 普通用户
  • 注册时间: 2006-03-08 02:00
文章分类

全部博文(62)

文章存档

2009年(45)

2008年(17)

我的朋友

分类: LINUX

2009-01-06 18:43:37

1. Overview

The connection from the server through the HBA to the storage controller is referred as a path. When multiple paths exists to a storage device(LUN) on a storage subsystem, it is referred as multipath connectivity. It is a enterprise level storage capability. Main purpose of multipath connectivity is to provide redundant access to the storage devices, i.e to have access to the storage device when one or more of the components in a path fail. Another advantage of multipathing is the increased throughput by way of load balancing.

  • /!\ Note: Multipathing protects against the failure of path(s) and not the failure of a specific storage device.

Common example of multipath is a SAN connected storage device. Usually one or more fibre channel HBAs from the host will be connected to the fabric switch and the storage controllers will be connected to the same switch.

A simple example of multipath could be: 2 HBAs connected to a switch to which the storage controllers are connected. In this case the storage controller can be accessed from either of the HBAs and hence we have multipath connectivity.

In the following diagram each host has 2 HBAs and each storage has 2 controllers. With the given configuration setup each host will have 4 paths to each of the LUNs in the storage.

fabric1.png

In Linux, a SCSI device is configured for a LUN seen on each path. i.e, if a LUN has 4 paths, then one will see four SCSI devices getting configured for the same device. Doing I/O to a LUN in a such an environment is unmanageable

  • applications/administrators do not know which SCSI device to use
  • all applications consistently using the same device
  • in case of a path failure, knowledge to retry the I/O on a different path
  • always using the storage device specific preferred path
  • spreading I/O between multiple valid paths

1.1. Device Mapper

Device mapper is a block subsystem that provides layering mechanism for block devices. One can write a device mapper to provide a specific functionality on top of a block device.

Currently the following functional layers are available:

  • concatenation
  • mirror
  • striping
  • encryption
  • flaky
  • delay
  • multipath

Multiple device mapper modules can be stacked to get the combined functionality.

Click for more information on device mapper.

1.2. Device Mapper Multipathing

Object of this document is to provide details on device mapper multipathing (DM-MP). DM-MP resolves all the that arise in accessing a multipathed device in Linux. It also provides a consistent user interface for storage devices provided by multiple vendors. There is only one block device (/dev/mapper/XXX) for a LUN. This is the device created by device mapper.

Paths are grouped into s, and one of the will be used for I/O, and is called . A selects a path in the to be used for an I/O based on some load balancing algorithm (for example ).

When a I/O fails in a path, that path gets disabled and the I/O is retried in a different path in the same . If all paths in a fails, a different which is will be selected to send I/O.

DM-MP consists of 4 components:

  1. DM MP kernel module - Kernel module that is responsible for making the multipathing decisions in normal and failure situations.

  2. multipath command - User space tool that allows the user with initial configuration, listing and deletion of multipathed devices.

  3. multipathd daemon - User space daemon that constantly monitors the paths. It marks a path as failed when it finds the path faulty and if all the paths in a priority group are faulty then it switches to the next .

    It keeps checking the failed path, once the failed path comes alive, based on the , it can activate the path. It provides an CLI to monitor/manage individual paths. It automatically creates device mapper entries when new devices comes into existence.

  4. kpartx - User space command that creates device mapper entries for all the partitions in a multipathed disk/LUN. When the multipath command is invoked, this command automatically gets invoked. For DOS based partitions this command need to be run manually.

2. Terminology, Concepts and Usage

2.1. Output of multipath command

Standard output of multipath command

# multipath -ll
mydev1 (3600a0b800011a1ee0000040646828cc5) dm-1 IBM,1815      FAStT
[size=512M][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=6][active]
 \_ 29:0:0:1 sdf 8:80  [active][ready]
 \_ 28:0:1:1 sdl 8:176 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 28:0:0:1 sdb 8:16  [active][ghost]
 \_ 29:0:1:1 sdq 65:0  [active][ghost]

Annotated output of multipath command

mydev1 (3600a0b800011a1ee0000040646828cc5) dm-1 IBM,1815      FAStT
------  ---------------------------------  ---- --- ---------------
   |               |                         |    |          |-------> Product
   |               |                         |    |------------------> Vendor
   |               |                         |-----------------------> sysfs name
   |               |-------------------------------------------------> WWID of the device
   |------------------------------------------------------ ----------> User defined Alias name

[size=512M][features=1 queue_if_no_path][hwhandler=1 rdac]
 ---------  ---------------------------  ----------------
     |                 |                        |--------------------> Hardware Handler, if any
     |                 |---------------------------------------------> Features supported
     |---------------------------------------------------------------> Size of the DM device

Path Group 1:
\_ round-robin 0 [prio=6][active]
-- -------------  ------  ------
 |    |              |      |----------------------------------------> Path group state
 |    |              |-----------------------------------------------> Path group priority
 |    |--------------------------------------------------------------> Path selector and repeat count
 |-------------------------------------------------------------------> Path group level

First path on Path Group 1:
 \_ 29:0:0:1 sdf 8:80  [active][ready]
    -------- --- ----   ------  -----
      |      |     |        |      |---------------------------------> Physical Path state
      |      |     |        |----------------------------------------> DM Path state
      |      |     |-------------------------------------------------> Major, minor numbers
      |      |-------------------------------------------------------> Linux device name
      |--------------------------------------------------------------> SCSI information: host, channel, scsi_id and lun

Second path on Path Group 1:
 \_ 28:0:1:1 sdl 8:176 [active][ready]

Path Group 2:
\_ round-robin 0 [prio=0][enabled]
 \_ 28:0:0:1 sdb 8:16  [active][ghost]
 \_ 29:0:1:1 sdq 65:0  [active][ghost]

2.2. Terminology

Path
Connection from the server through a HBA to a specific LUN. Without DM-MP, each path would appear as a separate device.

Path Group

Paths are grouped into a path groups. At any point of time only path group will be active. decides which path in the path group gets to send the next I/O. I/O will be sent only to the active path.

Path Priority

Each path has a specific priority. A program provides the priority for a given path. The user space commands use this priority value to choose an active path. In the group_by_prio , path priority is used to group the paths together and change their relative weight with the round robin .

Path Group Priority
Sum of priorities of all non-faulty paths in a path group. By default, the multipathd daemon tries to keep the path group with the highest priority active.

Path Grouping Policy
Determines how the path group(s) are formed using the available paths. There are five different policies:
  1. multibus: One path group is formed with all paths to a LUN. Suitable for devices that are in mode.

  2. failover: Each path group will have only one path.
  3. group_by_serial: One path group per storage controller(serial). All paths that connect to the LUN through a controller are assigned to a path group. Suitable for devices that are in mode.

  4. group_by_prio: Paths with same priority will be assigned to a path group.
  5. group_by_node_name: Paths with same target node name will be assigned to a path group.

/!\ Setting multibus as path grouping policy for a storage device in mode will reduce the I/O performance.

Path Selector
A kernel multipath component that determines which path will be chosen for the next I/O. Path selector can have an appropriate load balancing algorithm. Currently one one path selector exists, which is the round-robin.

Path Checker
Functionality in the user space that is used to check the availability of a path. This is implemented as a library function that is used by both multipath command and the multipathd daemon. Currently, there are 3 path checkers:
  1. readsector0: sends a read command to sector 0 at regular time interval. Produce lot of error messages in mode. Hence, suitable only for mode.

  2. tur: sends a test unit ready command at regular interval.
  3. rdac: specific to the lsi-rdac device. Sends a inquiry command and sets the status of the path appropriately.

Path States

This refers to the physical state of a path. A can be in one of the following states:

  1. ready: Path is up and can handle I/O requests.

  2. faulty: Path is down and cannot handle I/O requests.

  3. ghost: Path is a passive path. This state is shown in the passive path in mode.

  4. shaky: Path is up, but temporarily not available for I/O requests.

DM Path States
This refers to the DM module(kernel)'s view of the path's state. It can be in one of the two states:
  1. active: Last I/O sent to this path successfully completed. Analogous to .

  2. failed: Last I/O to this path failed. Analogous to .

Path Group State
Path Groups can be in one of the following three states:

  1. active: I/O will be sent to the multipath device will be sent to this path group. Only one path group will be in this state.
  2. enabled: If none of the paths in the active path group is in the ready state, I/O will be sent these path groups. There can be one or more path groups in this state.

  3. disabled: In none of the paths in the active path group and enabled path group is in the ready state. I/O will be sent to these path groups. There can be one or more path groups in this state. This state is available only for certain storage devices.

UID Callout (or) WWID Callout
A standalone program that returns a globally unique identifier for a path. multipath/multipathd invokes this callout and uses the ID returned to coalesce multiple paths to a single multipath device.

Priority Callout
A standalone program that returns the priority for a path. multipath/multipathd invokes this callout and uses the priority value of the paths to determine the active path group.

Hardware Handler
Kernel personality module for storage devices that needs special handling. This module is responsible for enabling a path (at the device level) during initialization, failover and failback. It is also responsible for handling device specific sense error codes.

Failover

When all the paths in a path group are in , one of the enabled path group (path with highest priority) with any paths in will be made . If there is no paths in ready state in any of the enabled path groups, then one of the disabled path group (path with highest priority) will be made active. Making a new path group active is also referred as switching of path group. Original active path group's state will be changed to enabled.

Failback

A failed path can become active at any point of time. multipathd keeps checking the path. Once it finds a path is active, it will change the state of the path to . If this action makes one of the path group's priority to be higher than the current active path group, multipathd may choose to failback to the highest priority path group.

Failback Policy

Under situations multipathd can do one of the following three things:

  1. immediate: Immediately failback to the highest priority path group.
  2. # of seconds: Wait for the specified number of seconds, for I/O to stabilize, then failback to the highest priority path group.
  3. do nothing: Do nothing, user explicitly fails back to the highest priority path group.

This policy selection can be set by the user through .

Active/Active
Storage devices with 2 controller can be configured in this mode. Active/Active means that both the controllers can process I/Os.

Active/Passive
Storage devices with 2 controller can be configured in this mode. Active/Passive means that one of the controllers(active) can process I/Os, and the other one(passive) is in a standby mode. I/Os to the passive controller will fail.

Alias

A user friendly and/or user defined name for a DM device. By default, WWID is used for the DM device. This is the name that is listed in /dev/disk/by-name directory. When the configuration option is set, the alias of a DM device will have the form of mpath. User also has the option of setting a unique alias for each multipath device.

2.3. Configuration File (/etc/multipath.conf)

DM-Multipath allows many of the feature to be user configurable using the configuration file /etc/multipath.conf. multipath command and multipathd uses the configuration information from this file. This file is consulted only during the configuration of multipath devices. In other words, if the user makes any changes to this file, then the multipath command need to be rerun to configure the multipath devices (i.e the user has to do multipath -F followed by multipath).

Support for many of the devices (as listed below) is inbuilt in the user space component of DM-Multipath. If the support for a specific storage device is not inbuilt or the user wants to override some of the values only then the user need to modify this file.

This file has 5 sections:

  1. System level defaults ("defaults"): Where the user can specify system level default override.

  2. Black listed devices ("blacklist"): User can specify the list of devices they do not want to be under the control of DM-Multipath. These devices will be excluded.

  3. Black list exceptions ("blacklist_exceptions"): Specific devices to be treated as multipath candidates even if they exist in the blacklist.

  4. Storage controller specific settings ("devices"): User specified configuration settings will be applied to devices with specified "Vendor" and "Product" information.

  5. Device specific settings ("multipaths"): User can fine tune configuration settings for individual LUNs.

User can specify the values for the attributes in this file using regular expression syntax.

For detailed explanation of the different attributes and allowed values for the attributes please refer to multipath.conf.annotated file.

  • In Mainline, this file is located in the root directory of multipath-tools.

    In , this file is located in the directory /usr/share/doc/device-mapper-multipath-X.Y.Z/. In SuSE, this file is located in the directory /usr/share/doc/packages/multipath-tools/

2.3.1. Attribute value overrides

Attribute values are set at multiple levels (internally in multipath tools and through multipath.conf file). Following is the order in which the attribute values will be overwritten.

  1. Global internal defaults, as specified in the man page of multipath.conf.
  2. Device specific internal defaults, as defined in libmultipath/hwtable.c.
  3. Items described in defaults section of /etc/multipath.conf.
  4. Items defined in device section of /etc/multipath.conf.
    • /!\ Note that this will completely overwrite configuration information defined in (2) above. So, if even if you want to change/add only one attribute one have to provide the whole list for a device.

  5. Items defined in multipaths section of /etc/multipath.conf.

2.4. multipath, multipathd command usage

Man page of multipath/multipathd provides good details on the usage of the tools.

multipathd has a interactive mode option which can be used for querying and managing the paths and also to check the configuration details that will be used.

When multipathd is running, one has to invoke multipathd with the command line multipathd -k. multipathd will enter into a command line mode where user can invoke different commands. Checkout the man page for different commands.

3. Supported Storage Devices

This is the list of devices that have configuration information built-in in the multipath tools. Not being in this list does not mean that the specific device is not supported, it just means that there is no built-in configuration in the multipath tools.

Some of the devices do need a hardware handler which need to compiled in the kernel. The device being in this list does mean that the hardware handler is present in the kernel. It is possible that the hardware handler is present in the kernel but the device is not added in the list of supported built-in devices.

3.1. Devices supported in multipath tools

Folowing are the list of storage devices that have configuration information buil-in in the multipath tools.

Vendor

Product

Common Name

Mainline 0.4.7

Mainline 0.4.8

RHEL5

RHEL5 U1

SLES10

SLES10 SP1

3PARdata

VV

-

YES

YES

YES

YES

YES

YES

APPLE

Xserve RAID

-

-

YES

YES

YES

YES

YES

{COMPAQ, HP}

{MSA,HSV}1*

-

YES

-

-

-

YES

-

(COMPAQ,HP)

(MSA|HSV)1.0.*

-

-

YES

-

-

-

YES

(COMPAQ,HP)

(MSA|HSV)1.1.*

-

-

YES

-

-

-

YES

(COMPAQ,HP)

MSA1.*

-

-

-

YES

YES

-

-

(COMPAQ,HP)

HSV(1|2).*

-

-

-

YES

YES

-

-

DDN

SAN

-

YES

YES

YES

YES

YES

YES

DEC

HSG80

-

YES

YES

YES

YES

YES

YES

DGC

*

-

YES

YES

YES

YES

YES

YES

EMC

SYMMETRIX

-

YES

YES

YES

YES

YES

YES

FSC

-

YES

YES

YES

YES

YES

YES

GNBD

GNBD

-

-

-

YES

YES

-

-

HITACHI

{A6189A,OPEN-}

-

YES

-

-

-

-

-

(HITACHI|HP)

OPEN-.*

-

-

YES

YES

YES

YES

YES

HITACHI

DF.*

-

-

YES

YES

YES

YES

YES

HP

A6189A

-

-

YES

YES

YES

YES

YES

HP

MSA VOLUME

-

-

YES

-

-

-

YES

HP

HSV2*

-

YES

YES

-

-

YES

YES

HP

LOGICAL VOLUME.*

-

-

YES

-

-

-

-

HP

DF[456]00

-

YES

-

-

-

-

-

Vendor

Product

Common Name

Mainline 0.4.7

Mainline 0.4.8

RHEL5

RHEL5 U1

SLES10

SLES10 SP1

IBM

4000R

-

YES

YES

YES

YES

YES

YES

IBM

1742

-

YES

YES

YES

YES

YES

YES

IBM

3526

-

-

YES

YES

YES

YES

YES

IBM

3542

-

YES

YES

YES

YES

YES

YES

IBM

2105F20

-

YES

YES

YES

YES

YES

YES

IBM

2105800

-

-

YES

YES

YES

YES

YES

IBM

{1750500,2145}

-

YES

YES

YES

YES

YES

YES

IBM

2107900

-

YES

YES

YES

YES

YES

YES

IBM

S/390 DASD ECKD

-

YES

YES

YES

YES

YES

YES

IBM

Nseries.*

-

-

YES

YES

YES

YES

YES

NETAPP

LUN.*

-

YES

YES

YES

YES

YES

YES

Pillar

Axiom 500

-

YES

YES

YES

YES

YES

YES

Pillar

Axiom.*

-

-

YES

-

-

-

-

SGI

TP9[13]00

-

YES

YES

YES

YES

YES

YES

SGI

TP9[45]00

-

YES

YES

YES

YES

YES

YES

SGI

IS.*

-

-

YES

-

-

-

YES

STK

OPENstorage D280

-

YES

YES

YES

YES

YES

YES

SUN

{ 3510,T4}

-

YES

YES

YES

YES

YES

YES

3.2. Devices that have hardware handler in kernel

Some storage device need special handling for path failover/failback. Which means that they need a hardware handler present in the kernel. Following are the list of the storage devices that has hardware handler in the kernel.

Generic controller Name

Storage device Name

Mainline 2.6.22

Mainline 2.6.23

RHEL5

RHEL5 U1

SLES10

SLES10 SP1

LSI Engenio

IBM DS4000 Series, IBM DS3000 Series

-

YES

-

YES

-

YES

EMC CLARiiON

AX/CX-series

YES

YES

YES

YES

YES

YES

-

HP Storage Works and Fibrecat

-

-

-

-

-

YES

4. Install and Boot on a multipathed device

There are advantages in making your boot/root partitions to be in SAN, like avoiding single point of failure, the disk content being accessible even if the server is down etc., This section describes the steps to be taken in the two major distributions to successfully install and boot of a SAN/multipathed device.

4.1. Installation instructions for SLES10

Note: This is tested on SLES10 SP1. If you have any other version, your mileage may vary.

  1. Install the OS in a device that has multiple paths.
    Make sure the root device's "Mount by" option is set to "Device by-id" (this option is available under "expert partitioner" as "fstab options").
    If you are installing on LVM, choose "Mount by" to be "by label".

  2. Complete the installation. Let the system boot up in multiuser mode.
    Make sure the root device, swap device are all referenced by their by-id device node entries instead of /dev/sd* type names. If they are not, fix them first.
    If using LVM, make sure the device are referenced by LABEL.

  3. Once booted, update /etc/multipath.conf
    If you have to make changes to /etc/multipath.conf, make the changes.

    Note: the option "user_friendly_names" is not supported by initrd. So, if you have user_friendly_names in your /etc/multipath.conf file, comment it for now, you can uncomment it .

  4. Enable multipathing by running the following commands
    • chkconfig boot.multipath on

      chkconfig multipathd on

  5. Add multipath module to initrd

    Edit the file /etc/sysconfig/kernel and add "dm-multipath" to INITRD_MODULES".
    Note: If your storage devices needs a hardware handler, add the corresponding module to INITRD_MODULES, in addition to "dm-multipath". For example add "dm-rdac" and "dm-multipath" to support IBM's DS4K storage devices

  6. Run mkinitrd, if required run lilo.

    Note: You can uncomment the user friendly name if you have commented it above.

  7. Reboot

The system will come up with the root disk on a multipathed device.

Note: You can switch off multipathing to the root device by adding multipath=off to the kernel command line.

4.2. Installation instructions for RHEL5

Note: This is tested on RHEL5 U1. If you have any other version, your mileage may vary.

  1. Start the installation with the kernel command line "linux mpath"

    • You will see multipathed devices (/dev/mapper/mpath*) as installation devices.
  2. Finish the installation.
  3. Reboot.

    • If your boot device does not need multipath.conf and does not have a special hardware handler, then you are done.
      If you have either of these, follow the steps below.

  4. Once booted, update multipath.conf file, if needed.
  5. Run mkinitrd, if you need a hardware handler, add it to initrd with --with option.

    • # mkinitrd /boot/initrd.final.img --with=dm-rdac

  6. Replace the initrd in your grub.conf/lilo.conf/yaboot.conf with the newly built initrd.
  7. Reboot.

The system will come up with the root disk on a multipathed device.

Note: You can switch off multipathing to the root device by adding multipath=off to the kernel command line.
Note: By default, disables dm-multipath by blacklisting all devices in /etc/multipath.conf. It just excludes your root device. If you do not see your other multipath devices through "multipath -ll", then check and fix the blacklist in /etc/multipath.conf

4.3. Other Distributions

5. Tips and Tricks

  1. Using alias: By default, the multipathed devices are named with the uid of the device, which one accesses through /dev/mapper/${uid_name}. When one uses user_friendly_names, devices will be named as mpath0, mpath1 etc., which may meet ones needs. User also have an option to define a alias in multipath.conf for each of the device.

Syntax is:

multipaths {
        multipath {
                wwid    3600a0b800011a2be00001dfa46cf0620
                alias   mydev1
        }
}
  1. Persistent device names: The names (uid_names or mpath names or alias names) that appear in /dev/mapper are persistent across boots, and the names dm-, dm-1 etc., can change between reboots. So, it is advisable to use the device names that appear under /dev/mapper and avoid using the dm-? names.
  2. Restart of tools after changing multipath,conf file: Once multipath.conf file is changed, the multipath tools need to be rerun for those configuration values to be effective. One has to kill multipathd, run multipath -F and then restart multipathd and multipath.
  3. Devices with paritions: Create device partitions before running multipath, as kpartx is configured to run to create multipathed partitions that way. Partions on device mpath0 appear as /dev/mapper/mpath0p1, /dev/mapper/mpath0p2, etc.,
  4. Using binding file in clustered environment: Bindings file holds the bindings between the device mapper names and the uid of the underlying device. By default the file is /var/lib/multipath/bindings, this can be changed by the multipath command line option -b. In a clustered environment, this file can be created in one node and can be transferred to another to get the same names.
    Note that the same effect can also be acheived by using alias and having the same multipath.conf file in all the nodes of the cluster.

  5. Getting the multipath device name corresponding to a SCSI device: If one knows the name of a SCSI device and wants to get the device mapper name associated with that the could use multipath -l /dev/sda, where sda is the SCSI device. On the other hand, if one knows the device mapper name and wants to know the underlying device names they could use the same command with the device mapper name. i.e multipath -l mpath0, where mpath0 is the device mapper name.
  6. When using LVM on dm-multipath devices, it is better to turn lvm scanning off on the underlying SCSI devices. This can be done by changing the filter parameter in /etc/lvm/lvm.conf to be filter = [ "a/dev/mapper/.*/", "r/dev/sd.*/" ].
    If your root device is also a multipathed lvm device, then make the above change before you create a new initrd image.

6. References

6.1. General

Mainline Documentation for multipath tools

6.2. IBM

Linux on Power: SLES10 - Root on dm-multipath device

阅读(1142) | 评论(0) | 转发(0) |
0

上一篇:MultipathUsageGuide

下一篇:linux multipath的组成

给主人留下些什么吧!~~