最近做了个测试服务器,系统全安装两块RAID1盘上,其余的三块盘做的RAID5存放数据.结果发现有时系统能正常启动.有时系统则是启动不了.
在网络上找到一篇文章.
实验目的:测试一下 3.0.5 的root partition安装在software raid1分区上的故障恢复,以解决原来服务器上的hostraid不支持的麻烦
实验环境:Vmware的虚拟机,guest os配置了两块2048M的磁盘, 3.0.5系统分三个区(/boot,/,swap)都安装在software raid1分区上,分区状况如下:
- Disk /dev/sda: 2147 MB, 2147483648 bytes
- 255 heads, 63 sectors/track, 261 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- /dev/sda1 1 13 104391 fd Linux raid autodetect
- /dev/sda2 14 78 522112+ fd Linux raid autodetect
- /dev/sda3 79 261 1469947+ fd Linux raid autodetect
- Disk /dev/sdb: 2147 MB, 2147483648 bytes
- 255 heads, 63 sectors/track, 261 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- /dev/sdb1 * 1 13 104391 fd Linux raid autodetect
- /dev/sdb2 14 78 522112+ fd Linux raid autodetect
- /dev/sdb3 79 261 1469947+ fd Linux raid autodetect
- Personalities : [raid1]
- md1 : active raid1 sdb2[1] sda2[0]
- 522048 blocks [2/2] [UU]
- md2 : active raid1 sdb3[1] sda3[0]
- 1469824 blocks [2/2] [UU]
- md0 : active raid1 sdb1[1] sda1[0]
- 104320 blocks [2/2] [UU]
- Filesystem Size Used Avail Use% Mounted on
- /dev/md2 1.4G 775M 651M 55% /
- /dev/md0 99M 7.3M 87M 8% /boot
- /dev/md1作为swap分区
模拟故障:将虚拟机关机后,直接在虚拟机的配置里面remove掉原有的scsi0:0
故障恢复:
-
因为此时的引导磁盘为原有系统的/dev/sdb,而 3.0.5在系统默认安装的时候,是不会往第一块磁盘之外的磁盘的mbr写grub信息的,所以这时候系统是启动不了的。当然了,如果是在生产服务器上,就算坏了一块磁盘也是不停机的,就不会遇到这个问题。
在要停机的前提下,为了解决这个问题,在系统安装完毕后,记得要进入要在第二块硬盘上安装grub信息:- root@localhost ~# grub
- grub> install (hd0,0)/grub/stage1 d (hd1) (hd0,0)/grub/stage2 p (hd0,0)/grub/grub.conf
就可以在系统重启之后正常进入系统。
-
根据/dev/sda的分区状况把新增加的磁盘分区
- 查看原有的分区情况:
- root@localhost ~# fdisk -l /dev/sda
- Disk /dev/sda: 2147 MB, 2147483648 bytes
- 255 heads, 63 sectors/track, 261 cylinders
- Units = cylinders of 16065 * 512 = 8225280 bytes
- Device Boot Start End Blocks Id System
- /dev/sda1 * 1 13 104391 fd Linux raid autodetect
- /dev/sda2 14 78 522112+ fd Linux raid autodetect
- /dev/sda3 79 261 1469947+ fd Linux raid autodetect
- 为新增加的磁盘分区,并将新分的分区格式更改为Linux raid autodetect:
- root@localhost ~# fdisk /dev/sdb
- Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
- Building a new DOS disklabel. Changes will remain in memory only,
- until you decide to write them. After that, of course, the previous
- content won't be recoverable.
- Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
- Command (m for help): n
- Command action
- e extended
- p primary partition (1-4)
- p
- Partition number (1-4): 1
- First cylinder (1-261, default 1):
- Using default value 1
- Last cylinder or +size or +sizeM or +sizeK (1-261, default 261): 13
- Command (m for help): n
- Command action
- e extended
- p primary partition (1-4)
- p
- Partition number (1-4): 2
- First cylinder (14-261, default 14):
- Using default value 14
- Last cylinder or +size or +sizeM or +sizeK (14-261, default 261): 78
- Command (m for help): n
- Command action
- e extended
- p primary partition (1-4)
- p
- Partition number (1-4): 3
- First cylinder (79-261, default 79):
- Using default value 79
- Last cylinder or +size or +sizeM or +sizeK (79-261, default 261):
- Using default value 261
- Command (m for help): t
- Partition number (1-4): 1
- Hex code (type L to list codes): fd
- Changed system type of partition 1 to fd (Linux raid autodetect)
- Command (m for help): t
- Partition number (1-4): 2
- Hex code (type L to list codes): fd
- Changed system type of partition 2 to fd (Linux raid autodetect)
- Command (m for help): t
- Partition number (1-4): 3
- Hex code (type L to list codes): fd
- Changed system type of partition 3 to fd (Linux raid autodetect)
- Command (m for help): w
- The partition table has been altered!
- Calling ioctl() to re-read partition table.
- Syncing disks.
- 查看原有的分区情况:
-
将新切出的分区加入raid
- root@localhost ~# mdadm /dev/md0 -a /dev/sdb1
- mdadm: added /dev/sdb1
- root@localhost ~# mdadm /dev/md1 -a /dev/sdb2
- mdadm: added /dev/sdb2
- root@localhost ~# mdadm /dev/md2 -a /dev/sdb3
- mdadm: added /dev/sdb3
查看刚回复的raid正在rebuild:
- root@localhost ~# more /proc/mdstat
- Personalities : [raid1]
- md1 : active raid1 sdb2[1] sda2[0]
- 522048 blocks [2/2] [UU]
- md2 : active raid1 sdb3[2] sda3[0]
- 1469824 blocks [2/1] [U_]
- [===========>.........] recovery = 55.8% (821632/1469824) finish=0.1min s
- peed=63202K/sec
- md0 : active raid1 sdb1[1] sda1[0]
- 104320 blocks [2/2] [UU]
- unused devices:
查看到raid已经恢复正常:
- root@localhost ~# more /proc/mdstat
- Personalities : [raid1]
- md1 : active raid1 sdb2[1] sda2[0]
- 522048 blocks [2/2] [UU]
- md2 : active raid1 sdb3[1] sda3[0]
- 1469824 blocks [2/2] [UU]
- md0 : active raid1 sdb1[1] sda1[0]
- 104320 blocks [2/2] [UU]
- unused devices:
-
最后记得将新的raid配置保存到/etc/mdadm.conf,要不然系统重新引导之后不能恢复raid配置:
- root@localhost ~# mdadm -Ds >/etc/mdadm.conf
实验结论:因尚未测试software raid1对系统性能带来的影响,所以只能肯定在对性能要求不是太高、而对数据安全性要求优先的情况下,将整个linux部署在software raid 1上是完全可行的。