血的教训啊!!!
Error description
Logical Volume Manager (LVM) commands may hang for a mirrored Volume Group (VG) under heavy I/O load. VG may hang in hd_ca_clnup(). When this happens most commands against the VG (e.g. lsvg) will hang. VG may hang with the following stack trace in hd_ca_clnup() (0)> f 132 pvthread+008400 STACK: [00053450]e_block_thread+0004E0 () [00053B40]e_sleep_thread+00005C (??, ??, ??) [040ACAB8]hd_ca_clnup+00009C (??) [0409BFF0]hd_close+0001B0 (??, ??, ??) [003E58B0]devcclose+0001E8 (??, ??) [004B94FC]spec_close+0000C0 (??, ??, ??, ??) [003FC608]vnop_close+000090 (??, ??, ??, ??) [0043C35C]vno_close+00004C (??) [004320E8]closef+000060 (??) [003C80EC]closefd+0000EC (??, ??) [003C83EC]closex+000268 (??, ??) [003C85AC]close+000100 (??) [00003810].svc_instr+000110 ()
Local fix
Once the hang happens the only way to recover is to reboot the system. To prevent future occurrences, disable MWCC on *ALL* the LVs of the VG.
Problem summary
In a Volume Group with mirrored LVs using active MWC (default), when closing an LV, the application may hang in hd_ca_clnup() waiting for MWC I/Os to complete that were never started. . This can happen only after a specific sequence of events that might occur during high I/O load to one mirrored LV while another LV is being closed.
Problem conclusion
Add additional checks in the relevant LVM MWC code to prevent this hang scenario from playing out.