郭一军,网名guoyJoe,尖峰在线培训创始人、Oracle OCM、ITPUB论坛Oracle认证版版主、2013最佳精华贴、Senior DBA、Oracle OCT、 ACOUG MEMBER、CSDN 认证专家、RHCE、在线技术分享的倡导者和实践者。 http://www.jianfengedu.com http://jianfengedu.taobao.com
全部博文(72)
2014年(72)
分类: Oracle
2014-10-29 18:22:18
转载请注明出处:http://blog.chinaunix.net/uid-28460966-id-4584119.html
1、重现ORA-8102错主的实验步骤:
SQL> conn gyj/gyj
Connected.
SQL> drop table gyj_1000;
Table dropped.
SQL> create table gyj_1000(id int,name varchar2(100));
Table created.
SQL> SQL> begin
2 for i in 1 .. 5000 loop
3 insert into gyj_1000 values(i,'gyj'||i);
4 commit;
5 end loop;
6 end;
7 /
SQL> select user_id,username from dba_users;
USER_ID USERNAME
---------- ------------------------------
9 OUTLN
32 GYJ
0 SYS
5 SYSTEM
31 APPQOSSYS
14 DIP
30 DBSNMP
21 ORACLE_OCM
8 rows selected.
SQL> alter table gyj_1000 add primary key(id);
Table altered.
SQL> select CONSTRAINT_NAME from dba_constraints where table_name='GYJ_1000';
CONSTRAINT_NAME
------------------------------
SYS_C003766
SQL> conn / as sysdba
Connected.
SQL> select name,con# from con$ where owner#=32;
NAME CON#
------------------------------ ----------
SYS_C003767 3767
BIN$Ayykzh6uGBfgUwEAAH9D0w==$0 3765
SYS_C003768 3768
SQL> select name,con# from con$ where name='_NEXT_CONSTRAINT';
NAME CON#
------------------------------ ----------
_NEXT_CONSTRAINT 3769
SQL> select dbms_rowid.rowid_relative_fno(rowid) file#,dbms_rowid.rowid_block_number(rowid) block#,
2 dbms_rowid.rowid_row_number(rowid) row#
3 from con$
4 where name='_NEXT_CONSTRAINT';
FILE# BLOCK# ROW#
---------- ---------- ----------
1 289 12
[oracle@jfdb ~]$ bbed parfile=par.txt
Password:
BBED: Release 2.0.0.0.0 - Limited Production on Tue Sep 16 17:52:09 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
************* !!! For Oracle Internal Use only !!! ***************
BBED> set file 1 block 289
FILE# 1
BLOCK# 289
BBED> p *kdbr[12]
rowdata[0]
----------
ub1 rowdata[0] @1220 0x2c
BBED> x /rccnn
rowdata[0] @1220
----------
flag@1220: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@1221: 0x02
cols@1222: 4
col 0[1] @1223: .
col 1[16] @1225: _NEXT_CONSTRAINT
col 2[3] @1242: 3769
col 3[1] @1246: 0
BBED> d /v offset 1242 count 16
File: /u01/app/oracle/oradata/jfdb/system01.dbf (1)
Block: 289 Offsets: 1242 to 1257 Dba:0x00400121
-------------------------------------------------------
03c22646 01802c00 04018010 5f4e4558 l .?&F..,....._NEX
<16 bytes per line>
SQL> select dump(3769,16) from dual;
DUMP(3769,16)
---------------------
Typ=2 Len=3: c2,26,46
我把值改小:45
BBED> modify /x 45 offset 1245
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
File: /u01/app/oracle/oradata/jfdb/system01.dbf (1)
Block: 289 Offsets: 1245 to 1260 Dba:0x00400121
------------------------------------------------------------------------
4401802c 00040180 105f4e45 58545f43
<32 bytes per line>
BBED> sum apply
Check value for File 1, Block 289:
current = 0xa831, required = 0xa831
BBED> p *kdbr[12]
rowdata[0]
----------
ub1 rowdata[0] @1220 0x2c
BBED> x /rccnn
rowdata[0] @1220
----------
flag@1220: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@1221: 0x02
cols@1222: 4
col 0[1] @1223: .
col 1[16] @1225: _NEXT_CONSTRAINT
col 2[3] @1242: 3768
col 3[1] @1246: 0
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1570009088 bytes
Fixed Size 2228704 bytes
Variable Size 1442844192 bytes
Database Buffers 117440512 bytes
Redo Buffers 7495680 bytes
Database mounted.
Database opened.
SQL> alter table gyj_1000 drop primary key;
Table altered.
SQL> SQL> alter table gyj_1000 add primary key(id);
alter table gyj_1000 add primary key(id)
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-08102: index key not found, obj# 52, file 1, block 28280 (2)
2、分析ORA-8102错误
alter table gyj_1000 add primary key(id);
ORA-8102常见于索引键值与表上存的值不一致。(Corruption related to Index 索引)
ORA-
8102即可能是ORACLE的bug,也可能是由于硬件I/O错误所引起。硬件或者I/O子系统由于丢失写 Lost
Write造成块的逻辑上讹误,当一个Lost
Io发生,包含对key的修改或者没有写入到ORACLE数据文件上,这即可能发生在表块上也可能发生在索引块上。
查看这个对象号为52
的对象,发现是 SYS.CON$ 表中 I_CON2索引 ,这是 一个 BOOTSTRP$ 对象,而且 OBJ# 为51 ,所以这是一个
核心BOOTSTRP$ 对象,是不能通过startup migrate 和 event 38003
重建的,所以最终只能通过BBED去修改这个块中有问题的地方。
先通过下面这个SQL,查找表和索引之间到底相差什么
SQL> conn / as sysdba
Connected.
SQL> SELECT /*+ FULL(t1) */
2 owner#, NAME, con#
3 FROM CON$ t1
4 MINUS
5 SELECT /*+ index(t I_CON2) */
6 owner#, NAME, con#
7 FROM CON$ t;
OWNER# NAME CON#
---------- ------------------------------ ----------
0 _NEXT_CONSTRAINT 3776
SQL> select /*+ FULL(t1) */ owner#,name,con# from con$ t1 WHERE NAME='_NEXT_CONSTRAINT';
OWNER# NAME CON#
---------- ------------------------------ ----------
0 _NEXT_CONSTRAINT 3776
SQL> select /*+ index(t1 I_CON2) */ owner#,name,con# from con$ t1 WHERE NAME='_NEXT_CONSTRAINT';
OWNER# NAME CON#
---------- ------------------------------ ----------
0 _NEXT_CONSTRAINT 3775
利用这样的方法来查询表和索引之间的不一致 ,
通过查询结果 确实可以看到 表和索引存在不一样的,就是CON$表中有 CON#=3776的这一列,而索引的键值中却没有3776
(要知道索引的键值就是保存该索引对应的字段值啊)
3、此时已经出现了类似的不一致。通过trace文件发现con$表的数
[root@jfdb ~]# cat /u01/app/oracle/diag/rdbms/jfdb/jfdb/trace/alert_jfdb.log
[root@jfdb ~]# vi /u01/app/oracle/diag/rdbms/jfdb/jfdb/trace/jfdb_ora_6518.trc
Trace file /u01/app/oracle/diag/rdbms/jfdb/jfdb/trace/jfdb_ora_6518.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0
System name: Linux
Node name: jfdb
Release: 2.6.18-128.el5
Version: #1 SMP Wed Dec 17 11:41:38 EST 2008
Machine: x86_64
VM name: VMWare Version: 6
Instance name: jfdb
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 6518, image: oracle@jfdb (TNS V1-V3)
*** 2014-09-16 18:10:13.477
*** SESSION ID:(1512.7) 2014-09-16 18:10:13.477
*** CLIENT ID:() 2014-09-16 18:10:13.477
*** SERVICE NAME:(SYS$USERS) 2014-09-16 18:10:13.477
*** MODULE NAME:(SQL*Plus) 2014-09-16 18:10:13.477
*** ACTION NAME:() 2014-09-16 18:10:13.477
oer 8102.2 - obj# 52, rdba: 0x00406e78(afn 1, blk# 28280)
kdk key 8102.2:
ncol: 1, len: 4
key: (4): 03 c2 26 45 ++++ 读取到的键值是03 c2 26 45 (通过vi搜索 8102.2)
mask: (4096):
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
通过上诉的 分析 ,现在我们基本可以确定,问题是这样的:
在文件1 块28280中有一个索引条目,它应该指向CON$表中 CON#=3768 这一条记录,
这个索引条目 的索引键 应该是 3768 (对应16进制为c2 26 45) ,它的ROWID应该就是CON$表上 CON#=3768 这一条记录的实际ROWID;
但是现在 这个索引条目的索引键不是3768,所以才导致了我们的问题。
SQL> select utl_raw.cast_to_number('c22645') from dual;
UTL_RAW.CAST_TO_NUMBER('C22645')
--------------------------------
3768
4、定位问题
知道了问题所在,那现在就是定位具体错误位置的时候了,根据上面的分析,
我们首先要得到CON$表中 CON#=3764 这一行的实际ROWID:
SQL> select dbms_rowid.rowid_relative_fno(rowid) file#,dbms_rowid.rowid_block_number(rowid) block#,
2 dbms_rowid.rowid_row_number(rowid) row#
3 from con$
4 where name='_NEXT_CONSTRAINT' --AND con#=3768;
FILE# BLOCK# ROW#
---------- ---------- ----------
1 289 12
ROWID的格式为还要不在解释?===》对象号,1号文件,289号块,12行
我把后面con#='3768'注释掉了,这是因为用CON#=3768直接查不出来,原因应该就是这个查询使用了I_CON2引,而索引中没有个这个键值的条目
OK那么接下来我们来这个trace中搜索执行计划,你会发现如下信息:
============
Plan Table
============
--------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------+-----------------------------------+
| 0 | UPDATE STATEMENT | | | | 2 | |
| 1 | UPDATE | CON$ | | | | |
| 2 | INDEX UNIQUE SCAN | I_CON1 | 1 | 22 | 1 | 00:00:01 |
--------------------------------------+-----------------------------------+
Content of other_xml column
===========================
db_version : 11.2.0.3
parse_schema : SYS
plan_hash : 2574219287
plan_hash_2 : 950544504
Outline Data:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
DB_VERSION('11.2.0.3')
OUTLINE_LEAF(@"UPD$1")
INDEX(@"UPD$1" "CON$"@"UPD$1" ("CON$"."OWNER#" "CON$"."NAME"))
END_OUTLINE_DATA
*/
5、我们来看一下52号对象到底是啥?
QL> SQL> col object_name for a30
SQL> set long 9999
SQL> select object_name,object_id,object_type from dba_objects where object_id=52;
OBJECT_NAME OBJECT_ID OBJECT_TYPE
------------------------------ ---------- -------------------
I_CON2 52 INDEX
分析下:select * from bootstrap$ where obj#52;
SQL> select dbms_metadata.get_ddl('INDEX','I_CON2','SYS') from dual;
DBMS_METADATA.GET_DDL('INDEX','I_CON2','SYS')
--------------------------------------------------------------------------------
CREATE UNIQUE INDEX "SYS"."I_CON2" ON "SYS"."CON$" ("CON#")
PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "SYSTEM"
6、现在我们得到了这个记录的实际ROWID,下一步就是去索引块里去查找了ROWID了。
dump 报错的索引块--rdba: 0x00406e78(afn 1, blk# 28280)
SQL> alter system dump datafile 1 block 28280;
System altered.
大家看一下最后一行数据的ROWID
more /u01/app/oracle/diag/rdbms/jfdb/jfdb/trace/jfdb_ora_5641.trc
row#266[4806] flag: ------, lock: 0, len=12, data:(6): 00 40 32 e6 00 67
col 0; len 3; (3): c2 25 59
row#267[4794] flag: ------, lock: 0, len=12, data:(6): 00 40 32 e6 00 68
col 0; len 3; (3): c2 25 5a
row#268[4782] flag: ------, lock: 0, len=12, data:(6): 00 40 32 e6 00 6a
col 0; len 3; (3): c2 25 61
row#269[3954] flag: ------, lock: 0, len=12, data:(6): 00 40 01 21 00 0c
col 0; len 3; (3): c2 26 46
----- end of leaf block dump -----
注意索引中的ROWID=文件号+块号+行号
文件号=0x004 ==>0000 0000 0100 ==》取前面10位的二进制:第1号文件
块号 =0x0121==>第289号
行号 =0x0c ==>第12行
SQL> select dbms_rowid.rowid_relative_fno(rowid) file#,dbms_rowid.rowid_block_number(rowid) block#,
2 dbms_rowid.rowid_row_number(rowid) row#
3 from con$
4 where name='_NEXT_CONSTRAINT' --AND con#=3768;
FILE# BLOCK# ROW#
---------- ---------- ----------
1 289 12
创建主键约束时读取到的最大键值是c2 26 45,而index 键值中记录的next 值应该是c2 26 46.
说明索引键值与表上存的值不一致
7、那么怎么办呢 ?如何处理这个问题 ?我们直接对索引进行rebuild ,看行不行 ?
SQL> alter index I_CON2 rebuild;
alter index I_CON2 rebuild
*
ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered
能不能rebuild ?
BBED> set block 28280
BLOCK# 28280
BBED> map /v
File: /u01/app/oracle/oradata/jfdb/system01.dbf (1)
Block: 28280 Dba:0x00406e78
------------------------------------------------------------
KTB Data Block (Index Leaf)
............................................
sb2 kd_off[270] @148
ub1 freespace[3366] @688
ub1 rowdata[4066] @4054
ub4 tailchk @8188
row#271[3858] flag: ------, lock: 0, len=12, data:(6): 00 40 01 21 00 0c
col 0; len 3; (3): c2 26 46
----- end of leaf block dump -----
前面我们讲过,对于index 键值的实际位置,其offset计算公式如下:
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0002.01f.000000b5 0x00c04884.001f.01 CB-- 0 scn 0x0000.00031295
0x02 0x000c.010.0000004c 0x0280000b.0021.35 C--- 0 scn 0x0000.0020726a
0x03 0x000e.009.00000086 0x0280003a.005d.26 C--- 0 scn 0x0000.0020766d
offset = kd_off + 44+8+itl*3 ,那么实际位置应该是:
3858+ 44+8+3*24=4076
SSQL> select 3858+ 44+8+3*24 from dual;
3858+44+8+3*24
--------------
3982