Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1301989
  • 博文数量: 127
  • 博客积分: 2286
  • 博客等级: 大尉
  • 技术积分: 1943
  • 用 户 组: 普通用户
  • 注册时间: 2010-06-10 10:37
文章分类

全部博文(127)

文章存档

2018年(1)

2015年(2)

2014年(1)

2013年(30)

2012年(88)

2011年(5)

分类: Oracle

2012-08-21 11:19:42

曾今在在一次灾备演习的时候,我无法启动ASM,diskgroup无法mount。我询问美国storage team是否恢复了磁盘,丫居然说都做好的,我检查了半天,没法先任何问题。

最后用kfed一看disk header block,发现数据全是0,丫就是块新盘。美国人只好承认没有恢复......

后来发现这个工具还有其它重要的用处:


ASM由于各种故障,非常容易导致disk header block损坏,对它的备份和回复显的尤为重要。

编译

KFED是ASM自带的一个未公开的工具,但是和BBED 命令一样,需要编译过以后才能使用。它可以读取和修改ASM磁盘的元数据,重要的是,它在ASM无法启动的时候也可以工作,对修复一些关键错误非常有用:

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins* ikfed


检查ASM disk header信息

$ kfed read /dev/oracleasm/disks/DISK4
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
...
kfbh.check: 1539641569 ; 0x00c: 0x5bc510e1
...
kfdhdb.driver.provstr: ORCLDISKDISK4 ; 0x000: length=13
...
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: PLAY0 ; 0x028: length=5
kfdhdb.grpname: PLAY ; 0x048: length=4
kfdhdb.fgname: P1 ; 0x068: length=2
...
kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
kfdhdb.ausize: 4194304 ; 0x0bc: 0x00400000
...
kfdhdb.dsksize: 1221 ; 0x0c4: 0x000004c5
...


Valid ASM disk should have kfbh.type=KFBTYP_DISKHEAD (ASM disk header).

ASMLIB disk name should follow after 'ORCLDISK' in kfdhdb.driver.provstr field. Note that ASMLIB disk name does not have to be the same as ASM disk name.

Is this really the disk with ASM name PLAY0 (kfdhdb.dskname) and disk number 0 (kfdhdb.dsknum=0) in disk group PLAY (kfdhdb.grpname)? If you are not sure, check the ASM alert log entries around the time of the last successful mount of disk group PLAY.

Header status says MEMBER for this disk (kfdhdb.hdrsts=KFDHDR_MEMBER). That is what we want to see.

ASM metadata block size is 4 KB (kfdhdb.blksize=4096) and allocation unit size is 4 MB for this disk (kfdhdb.ausize=4194304) and the disk size is 1221 AUs, i.e 4884 MB. Is that what you think it should be? Is that what you see at the OS level for that device?

If you see kfbh.type=KFBTYP_INVALID in the disk header on a disk you believe belongs to an ASM disk group, that may indicate that the ASM disk header is damaged. But don't jump to conclusions. Are you looking at the right disk? Is this the right disk partition? Can you access that disk via some other name - in a multipath setup? If you are not sure, or if the disk is in fact damaged, log an SR with Oracle Support to check it out.

I should say that the ASM disk header may look fine, but in fact be corrupt. For example the block checksum (kfbh.check) could be wrong in which case that would need to be corrected. Please log an SR with Oracle Support to assist with that problem.

Note that kfed was used with no additional options. Of note is that no allocation unit number and no block number were specified, which means that default values would be used (0 for both). The command used was:

恢复disk header

对于版本低于10.2.0.5或者11.1.0.7的ASM,对于disk header没有自动备份,最好手工备份一下

kfed read /dev/raw/raw1 text=raw1.txt

disk header更改频率不高,可以在故障后这样恢复

kfed merge /dev/raw/raw1 text=raw1.txt

如果实在没有备份,从同组或类似磁盘备份一份,修改这个文本文件的关键地方,恢复回去


对于版本10.2.0.5或者11.1.0.7的ASM 以上者,可以直接这样就恢复了

 kfed repair /dev/raw/raw1


当然恢复的前提是只有head block损害了,可以检查下其它block是否正常

ausize=`kfed read DISK4 | grep ausize | tr -s ' ' | cut -d' ' -f2`
blksize=`kfed read DISK4 | grep blksize | tr -s ' ' | cut -d' ' -f2`
let n=$ausize/$blksize

for (( i=2; i<$n; i++ ))
do
  kfed read /dev/oracleasm/disks/DISK4 blkn=$i | grep KFBTYP
done

 不能有任何这样的行

kfbh.type=KFBTYP_INVALID



参考文档:

结论:
 从Oracle 10.2.0.5开始( 实际上是Oracle 11g出现的功能, patchsest 11.1.0.7),ASM磁盘已经开始自动将头块进行备份,备份块的位置在第2个AU的倒数第2个块上(对于默认1M的AU来说,是第510个块), 如果头块损坏,可以用kfed repair命令来修复。因此对于选用ASM存储作为生产环境的用户来说,尽快升级到10.2.0.5是明智的选择。  

kfed - ASM metadata editor


阅读(7539) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~