Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2885638
  • 博文数量: 599
  • 博客积分: 16398
  • 博客等级: 上将
  • 技术积分: 6875
  • 用 户 组: 普通用户
  • 注册时间: 2009-11-30 12:04
个人简介

WINDOWS下的程序员出身,偶尔也写一些linux平台下小程序, 后转行数据库行业,专注于ORACLE和DB2的运维和优化。 同时也是ios移动开发者。欢迎志同道合的朋友一起研究技术。 数据库技术交流群:58308065,23618606

文章分类

全部博文(599)

文章存档

2014年(12)

2013年(56)

2012年(199)

2011年(105)

2010年(128)

2009年(99)

分类: Oracle

2012-05-17 10:52:28

前几天和朋友一起吃饭的时候,聊到如果一个数据文件被删除了,那么在这个文件中相关表是否还能操作。
在UNIX平台上,打开的文件都有一个文件句柄与之关联,一般的文件被删除,相应的文件句柄并没有释放。
此时我们是可以继续对这个文件进行操作的,直到文件句柄被释放。
看如下一个例子:
 
[oracle@dbtest ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Thu May 17 10:16:48 2012
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> select * from scott.t      ;
        ID NAME
---------- ------------------------------------------------------------
         1 cpic
         2 huateng
         3 yanshoupeng
SQL> select dbms_rowid.rowid_relative_fno(rowid) file# from scott.t;
     FILE#
----------
         4
         4
         4
SQL> select name  from v$dbfile where file#=4;
NAME
--------------------------------------------------------------------------------
/test/orcl/orcl/users01.dbf

SCOTT下的表T存放到了文件 /test/orcl/orcl/users01.dbf 中。

SQL> insert into scott.t values(4,'a');
1 row created.
SQL> commit;
Commit complete.
SQL> alter system checkpoint ;
System altered.
SQL> select * from scott.t      ;
        ID NAME
---------- ------------------------------------------------------------
         4 a
         1 cpic
         2 huateng
         3 yanshoupeng

日常的操作是没有任何问题的。
此时我们删除这个文件。
SQL> ! rm -rf /test/orcl/orcl/users01.dbf

SQL> ! ls -ltr /test/orcl/orcl/
total 2625104
-rw-r--r--    1 oracle   oinstall        373 May 16 10:52 log.bbd
-rw-r-----    1 oracle   oinstall   52429312 May 16 14:21 redo01.log
-rw-r-----    1 oracle   oinstall   20979712 May 16 22:00 temp01.dbf
-rw-r-----    1 oracle   oinstall   52429312 May 17 05:00 redo02.log
-rw-r-----    1 oracle   oinstall   31465472 May 17 10:18 undotbs01.dbf
-rw-r-----    1 oracle   oinstall  513810432 May 17 10:18 system01.dbf
-rw-r-----    1 oracle   oinstall  314580992 May 17 10:18 sysaux01.dbf
-rw-r-----    1 oracle   oinstall   96477184 May 17 10:18 abcd_2.dbf
-rw-r-----    1 oracle   oinstall   96477184 May 17 10:18 abcd_1.dbf
-rw-r-----    1 oracle   oinstall   96477184 May 17 10:18 abcd_0.dbf
-rw-r-----    1 oracle   oinstall   52429312 May 17 10:20 redo03.log
-rw-r-----    1 oracle   oinstall    5488640 May 17 10:20 control03.ctl
-rw-r-----    1 oracle   oinstall    5488640 May 17 10:20 control02.ctl
-rw-r-----    1 oracle   oinstall    5488640 May 17 10:20 control01.ctl
SQL> select * from scott.t      ;
        ID NAME
---------- ------------------------------------------------------------
         4 a
         1 cpic
         2 huateng
         3 yanshoupeng
SQL> delete from scott.t;
4 rows deleted.
SQL> commit;
Commit complete.
 
SQL> insert into scott.t values(5,'b');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from scott.t;
        ID NAME
---------- ------------------------------------------------------------
         5 b
SQL> alter system checkpoint;
System altered.
SQL>  select * from scott.t;
        ID NAME
---------- ------------------------------------------------------------
         5 b
可以看到数据库还能继续操作的,像没有删除一样。
而且数据库日志中一点错误也没有。
[oracle@dbtest ~]$ tail -f /u01/oracle/admin/orcl/bdump/alert_orcl.log
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed May 16 14:32:04 2012
OS Pid: 516540 executed alter system set events 'immediate trace name heapdump level 10'
Wed May 16 14:58:51 2012
System State dumped to trace file /u01/oracle/admin/orcl/udump/orcl_ora_516540.trc
Thu May 17 05:00:33 2012
Thread 1 advanced to log sequence 18 (LGWR switch)
  Current log# 3 seq# 18 mem# 0: /test/orcl/orcl/redo03.log
 
虽然文件被删除了,但是相应的文件句柄并没有释放,ORACLE相关的进程通过这个文件句柄还能继续对文件进行操作,就像没有被删除一样。
以前的时候很多朋友都不理解,为何当前的日志文件被删除,ORACLE没有crash掉,还能继续运行,其实是一个原因。

[oracle@dbtest ~]$ ps -ef | grep ora_dbw
  oracle  884830       1   0 14:21:27      -  0:01 ora_dbw2_orcl
  oracle  758204  406292   0 10:21:29  pts/7  0:00 grep ora_dbw
  oracle  275238       1   0 14:21:27      -  0:01 ora_dbw3_orcl
  oracle  709866       1   0 14:21:27      -  0:01 ora_dbw1_orcl
  oracle  395236       1   0 14:21:27      -  0:01 ora_dbw0_orcl
[oracle@dbtest ~]$ cd /proc/395236
[oracle@dbtest 395236]$ ls
as      cred    ctl     cwd     fd      lwp     map     object  psinfo  sigact  status  sysent
[oracle@dbtest 395236]$ cd fd
[oracle@dbtest fd]$ ls
0   1   10  12  13  14  15  16  2   3   4   5   6   7   8   9
[oracle@dbtest fd]$ ls -ltr
total 3976
-r--r--r--    1 oracle   oinstall     849408 May 10 2008  16
-r--r--r--    1 oracle   oinstall     849408 May 10 2008  12
p---------    0 oracle   oinstall          0 May 16 10:54 9
-rw-rw----    0 oracle   oinstall          0 May 16 14:21 7
--w-------    1 oracle   oinstall        858 May 16 14:21 5
--w-------    1 oracle   oinstall     157667 May 17 05:00 8
--w-------    1 oracle   oinstall     157667 May 17 05:00 6
-rw-rw----    1 oracle   oinstall       1544 May 17 10:19 15
-rw-rw----    1 oracle   oinstall       1544 May 17 10:19 10
c---------    1 oracle   oinstall      2,  2 May 17 10:21 4
c---------    1 oracle   oinstall      2,  2 May 17 10:21 3
c---------    1 oracle   oinstall      2,  2 May 17 10:21 2
c---------    1 oracle   oinstall      2,  2 May 17 10:21 1
c---------    1 oracle   oinstall      2,  2 May 17 10:21 0
-rw-rw-rw-    1 oracle   oinstall          0 May 17 10:21 14
-rw-rw-rw-    1 oracle   oinstall          0 May 17 10:21 13

如果平台是LINUX平台,那么进入目录fd的时候,你将会看到文件和文件句柄的对应关系,及其那些文件被删除了。
我的平台是AIX的,和LINUX平台不太一样,需要借助lsof命令来查看。
遗憾的是我这边的lsof命令不能用,不能给大家演示。
假设被删除的文件的句柄是10,那么我们可以通过如下命令将文件COPY回去:
cp 10 /test/orcl/orcl/users01.dbf
然后进行一下recover就行了,在没有备份的情况下,如果能及时的把误删文件COPY回去,就能及时挽救我们的数据库。
 
大家可以参考这篇文章:
 
lsof aix 使用 lsof 查找打开的文件
 
如果打开这个文件的进程被关闭了,相应的文件句柄将会释放,此时就再也看不到这个文件了。
我们SHUTDOWN数据库,然后再启动就会报错:
SQL> select * from v$recover_file;
no rows selected
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1610612736 bytes
Fixed Size                  2084400 bytes
Variable Size             385876432 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/test/orcl/orcl/users01.dbf'

SQL> select * from v$recover_file;
     FILE# ONLINE         ONLINE_STATUS
---------- -------------- --------------
ERROR                                                                CHANGE#
----------------------------------------------------------------- ----------
TIME
-------------------
         4 ONLINE         ONLINE
FILE NOT FOUND                                                             0
 
日志文件中有如下错误:

[oracle@dbtest ~]$ tail -f /u01/oracle/admin/orcl/bdump/alert_orcl.log
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed May 16 14:32:04 2012
OS Pid: 516540 executed alter system set events 'immediate trace name heapdump level 10'
Wed May 16 14:58:51 2012
System State dumped to trace file /u01/oracle/admin/orcl/udump/orcl_ora_516540.trc
Thu May 17 05:00:33 2012
Thread 1 advanced to log sequence 18 (LGWR switch)
  Current log# 3 seq# 18 mem# 0: /test/orcl/orcl/redo03.log
Thu May 17 10:47:56 2012
Shutting down instance (abort)
License high water mark = 5
Instance terminated by USER, pid = 394304
Thu May 17 10:48:03 2012
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 en1 10.192.14.0 configured from OCR for use as a cluster interconnect
Interface type 1 en0 10.192.39.0 configured from OCR for use as  a public interface
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =18
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
  processes                = 150
  sessions                 = 170
  __shared_pool_size       = 352321536
  __large_pool_size        = 16777216
  __java_pool_size         = 16777216
  __streams_pool_size      = 0
  sga_target               = 1610612736
  control_files            = /test/orcl/orcl/control01.ctl, /test/orcl/orcl/control02.ctl,
/test/orcl/orcl/control03.ctl
  db_block_size            = 8192
  __db_cache_size          = 1207959552
  compatible               = 10.2.0.3.0
  log_archive_dest_1       = LOCATION=/test/orcl/arch
  log_archive_format       = %t_%s_%r.arc
  db_file_multiblock_read_count= 16
  db_recovery_file_dest    = /test/orcl/flashback
  db_recovery_file_dest_size= 2147483648
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  remote_login_passwordfile= EXCLUSIVE
  db_domain                =
  dispatchers              = (PROTOCOL=TCP) (SERVICE=orclXDB)
  job_queue_processes      = 10
  background_dump_dest     = /u01/oracle/admin/orcl/bdump
  user_dump_dest           = /u01/oracle/admin/orcl/udump
  core_dump_dest           = /u01/oracle/admin/orcl/cdump
  audit_file_dest          = /u01/oracle/admin/orcl/adump
  db_name                  = orcl
  open_cursors             = 300
  pga_aggregate_target     = 1646264320
Thu May 17 10:48:04 2012
Oracle instance running with ODM: Veritas 5.0 ODM Library, Version 1.1
cluster interconnect IPC version:
        VERITAS IPC '5.0.3.400' 04:49:57 Apr 16 2010
IPC Vendor 86 proto 76
  Version 1.0
PMON started with pid=2, OS id=632182
DIAG started with pid=3, OS id=705994
PSP0 started with pid=4, OS id=570926
LMON started with pid=5, OS id=472866
LMD0 started with pid=6, OS id=640160
MMAN started with pid=7, OS id=884854
DBW0 started with pid=8, OS id=741972
DBW1 started with pid=9, OS id=373816
DBW2 started with pid=10, OS id=525912
DBW3 started with pid=11, OS id=557894
LGWR started with pid=12, OS id=395244
CKPT started with pid=13, OS id=366402
SMON started with pid=14, OS id=697382
RECO started with pid=15, OS id=668764
CJQ0 started with pid=16, OS id=415276
MMON started with pid=17, OS id=480782
Thu May 17 10:48:04 2012
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=18, OS id=456092
Thu May 17 10:48:04 2012
starting up 1 shared server(s) ...
Thu May 17 10:48:04 2012
lmon registered with NM - instance id 1 (internal mem no 0)
Thu May 17 10:48:05 2012
Reconfiguration started (old inc 0, new inc 2)
List of nodes:
 0
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 Resources and enqueues cleaned out
 Resources remastered 0
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
Reconfiguration complete
Thu May 17 10:48:05 2012
ALTER DATABASE   MOUNT
Thu May 17 10:48:09 2012
Setting recovery target incarnation to 2
Thu May 17 10:48:09 2012
Successful mount of redo thread 1, with mount id 1311324261
Thu May 17 10:48:09 2012
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Thu May 17 10:48:09 2012
ALTER DATABASE OPEN
Thu May 17 10:48:09 2012
Errors in file /u01/oracle/admin/orcl/bdump/orcl_dbw0_741972.trc:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/test/orcl/orcl/users01.dbf'
ORA-17503: ksfdopn:4 Failed to open file /test/orcl/orcl/users01.dbf
ORA-17500: ODM err:File does not exist
ORA-1157 signalled during: ALTER DATABASE OPEN...
 
此时我们就只能通过备份恢复数据库了。

SQL> ! rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Thu May 17 10:49:10 2012
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
connected to target database: ORCL (DBID=1301704889, not open)
RMAN> restore datafile 4;
Starting restore at 2012-05-17 10:49:14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=149 devtype=DISK
channel ORA_DISK_1: restoring datafile 00004
input datafile copy recid=1 stamp=782931126 filename=/tmp/users01.dbf
destination for restore of datafile 00004: /test/orcl/orcl/users01.dbf
channel ORA_DISK_1: copied datafile copy of datafile 00004
output filename=/test/orcl/orcl/users01.dbf recid=5 stamp=783514156
Finished restore at 2012-05-17 10:49:17
RMAN> recover datafile 4;
Starting recover at 2012-05-17 10:49:20
using channel ORA_DISK_1
starting media recovery
archive log thread 1 sequence 7 is already on disk as file /test/orcl/arch/1_7_773924155.arc
archive log thread 1 sequence 8 is already on disk as file /test/orcl/arch/1_8_773924155.arc
archive log thread 1 sequence 9 is already on disk as file /test/orcl/arch/1_9_773924155.arc
archive log thread 1 sequence 10 is already on disk as file /test/orcl/arch/1_10_773924155.arc
archive log thread 1 sequence 11 is already on disk as file /test/orcl/arch/1_11_773924155.arc
archive log thread 1 sequence 12 is already on disk as file /test/orcl/arch/1_12_773924155.arc
archive log thread 1 sequence 13 is already on disk as file /test/orcl/arch/1_13_773924155.arc
archive log thread 1 sequence 14 is already on disk as file /test/orcl/arch/1_14_773924155.arc
archive log thread 1 sequence 15 is already on disk as file /test/orcl/arch/1_15_773924155.arc
archive log thread 1 sequence 16 is already on disk as file /test/orcl/arch/1_16_773924155.arc
archive log thread 1 sequence 17 is already on disk as file /test/orcl/arch/1_17_773924155.arc
archive log filename=/test/orcl/arch/1_7_773924155.arc thread=1 sequence=7
archive log filename=/test/orcl/arch/1_8_773924155.arc thread=1 sequence=8
archive log filename=/test/orcl/arch/1_9_773924155.arc thread=1 sequence=9
archive log filename=/test/orcl/arch/1_10_773924155.arc thread=1 sequence=10
archive log filename=/test/orcl/arch/1_11_773924155.arc thread=1 sequence=11
archive log filename=/test/orcl/arch/1_12_773924155.arc thread=1 sequence=12
archive log filename=/test/orcl/arch/1_13_773924155.arc thread=1 sequence=13
archive log filename=/test/orcl/arch/1_14_773924155.arc thread=1 sequence=14
archive log filename=/test/orcl/arch/1_15_773924155.arc thread=1 sequence=15
media recovery complete, elapsed time: 00:00:14
Finished recover at 2012-05-17 10:49:37
RMAN> exit

Recovery Manager complete.
SQL> alter database open;
Database altered.
SQL>
 
 
 
阅读(2431) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~