What is a mailbox disk?
Mailbox disks are used to store cluster related
data that needs to be persistent across reboots. Specifically,
information about cluster state, state of the mirrors, and ownership is
read by a clustered filer during the boot process. If the filer finds a
reservation on its disks, then instead of booting normally, the filer
will echo "Waiting for giveback". The filer will not boot into normal
operational mode and will assume that the partner has taken over. It
will wait for cf giveback to be
performed before continuing a normal boot. If there is no reservation in
place, then the filer will boot normally. The reason why the filer
writes information on the mailbox is to show its partner that it is
still connected. It also records information about different states,
like mirrored states, which are used to determine which plex of a mirror
is more up-to-date. A mailbox is a secondary way (besides interconnect
cable) to ensure a heartbeat and avoid a split-brain situation. It also avoids the unnecessary takeover that can be caused by any potential disruption of the interconnect cable.
How does the filer choose a mailbox disk?
In normal situations, the filer always chooses the parity
disk and the first data disk of the root vol to be the two mailbox
disks. But if at any time one of the two disks fail, the mailbox will be
changed to the Dparity disk of the root vol.
The mailbox disks can be changed by ONTAP, for example, if
the parity disk of the root vol fail. Data ONTAP will put the mailbox
role on the dparity disk of root vol, and will let it be the new
mailbox. Then, once the new replaced disk finishes the reconstruction,
the role will be changed back to the current parity and the first data
disk of the root vol.
How does the filer access mailbox disks?
The filer will write information to its own mailbox disks.
It reads the information that is written by its partner from the
partner's mailbox disks, but it never writes anything on partner's
mailbox. So, even though the scsi3 reservation may be put on partner's
disk, the local filer can still read data from the partner's disks.
Using this way, the filer detects whether the partner head is still
alive. The filer will try to access the mailbox disks every 3-5 seconds.
If all the mailbox disks in the local side are removed concurrently, a
"permanent error of accessing mailbox disk" error will occur and the
system will panic.
How does Data ONTAP use mailbox disks to judge in which situation to disable the cluster?
It use "majority/quorum" rule. Say, half of the members will
maintain the stability of cluster. If less than half of the members are
available, in other words, more than half of the members fail, the
cluster will be disabled. So if at any time one of the two mailboxes is
bad/broken/no response, there will be an "mailbox uncertain" or "mailbox
error detected" message pop up, and the cluster is disabled for a while
to check if the situation can recover. After a while, if the mailbox
disks is successfully changed to other good disk in the raid group, the
cluster will be enabled again.
In SyncMirror situation, there could be 4 mailbox disks on
one side, 2 for local and 2 for partner, if the aggregate or volume
containing the MB disks is syncmirrored. A this time, if one of them
fails, the message won't appear because the quorum hasn't been reached.
If two or more fail, the message will be displayed.
阅读(698) | 评论(2) | 转发(0) |