前几天和朋友一起吃饭的时候,聊到如果一个数据文件被删除了,那么在这个文件中相关表是否还能操作。
在UNIX平台上,打开的文件都有一个文件句柄与之关联,一般的文件被删除,相应的文件句柄并没有释放。
此时我们是可以继续对这个文件进行操作的,直到文件句柄被释放。
看如下一个例子:
[oracle@dbtest ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Thu May 17 10:16:48 2012
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> select * from scott.t ;
ID NAME
---------- ------------------------------------------------------------
1 cpic
2 huateng
3 yanshoupeng
SQL> select dbms_rowid.rowid_relative_fno(rowid) file# from scott.t;
FILE#
----------
4
4
4
SQL> select name from v$dbfile where file#=4;
NAME
--------------------------------------------------------------------------------
/test/orcl/orcl/users01.dbf
SCOTT下的表T存放到了文件 /test/orcl/orcl/users01.dbf 中。
SQL> insert into scott.t values(4,'a');
1 row created.
SQL> commit;
Commit complete.
SQL> alter system checkpoint ;
System altered.
SQL> select * from scott.t ;
ID NAME
---------- ------------------------------------------------------------
4 a
1 cpic
2 huateng
3 yanshoupeng
日常的操作是没有任何问题的。
此时我们删除这个文件。
SQL> ! rm -rf /test/orcl/orcl/users01.dbf
SQL> ! ls -ltr /test/orcl/orcl/
total 2625104
-rw-r--r-- 1 oracle oinstall 373 May 16 10:52 log.bbd
-rw-r----- 1 oracle oinstall 52429312 May 16 14:21 redo01.log
-rw-r----- 1 oracle oinstall 20979712 May 16 22:00 temp01.dbf
-rw-r----- 1 oracle oinstall 52429312 May 17 05:00 redo02.log
-rw-r----- 1 oracle oinstall 31465472 May 17 10:18 undotbs01.dbf
-rw-r----- 1 oracle oinstall 513810432 May 17 10:18 system01.dbf
-rw-r----- 1 oracle oinstall 314580992 May 17 10:18 sysaux01.dbf
-rw-r----- 1 oracle oinstall 96477184 May 17 10:18 abcd_2.dbf
-rw-r----- 1 oracle oinstall 96477184 May 17 10:18 abcd_1.dbf
-rw-r----- 1 oracle oinstall 96477184 May 17 10:18 abcd_0.dbf
-rw-r----- 1 oracle oinstall 52429312 May 17 10:20 redo03.log
-rw-r----- 1 oracle oinstall 5488640 May 17 10:20 control03.ctl
-rw-r----- 1 oracle oinstall 5488640 May 17 10:20 control02.ctl
-rw-r----- 1 oracle oinstall 5488640 May 17 10:20 control01.ctl
SQL> select * from scott.t ;
ID NAME
---------- ------------------------------------------------------------
4 a
1 cpic
2 huateng
3 yanshoupeng
SQL> delete from scott.t;
4 rows deleted.
SQL> commit;
Commit complete.
SQL> insert into scott.t values(5,'b');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from scott.t;
ID NAME
---------- ------------------------------------------------------------
5 b
SQL> alter system checkpoint;
System altered.
SQL> select * from scott.t;
ID NAME
---------- ------------------------------------------------------------
5 b
可以看到数据库还能继续操作的,像没有删除一样。
而且数据库日志中一点错误也没有。
[oracle@dbtest ~]$ tail -f /u01/oracle/admin/orcl/bdump/alert_orcl.log
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed May 16 14:32:04 2012
OS Pid: 516540 executed alter system set events 'immediate trace name heapdump level 10'
Wed May 16 14:58:51 2012
System State dumped to trace file /u01/oracle/admin/orcl/udump/orcl_ora_516540.trc
Thu May 17 05:00:33 2012
Thread 1 advanced to log sequence 18 (LGWR switch)
Current log# 3 seq# 18 mem# 0: /test/orcl/orcl/redo03.log
虽然文件被删除了,但是相应的文件句柄并没有释放,ORACLE相关的进程通过这个文件句柄还能继续对文件进行操作,就像没有被删除一样。
以前的时候很多朋友都不理解,为何当前的日志文件被删除,ORACLE没有crash掉,还能继续运行,其实是一个原因。
[oracle@dbtest ~]$ ps -ef | grep ora_dbw
oracle 884830 1 0 14:21:27 - 0:01 ora_dbw2_orcl
oracle 758204 406292 0 10:21:29 pts/7 0:00 grep ora_dbw
oracle 275238 1 0 14:21:27 - 0:01 ora_dbw3_orcl
oracle 709866 1 0 14:21:27 - 0:01 ora_dbw1_orcl
oracle 395236 1 0 14:21:27 - 0:01 ora_dbw0_orcl
[oracle@dbtest ~]$ cd /proc/395236
[oracle@dbtest 395236]$ ls
as cred ctl cwd fd lwp map object psinfo sigact status sysent
[oracle@dbtest 395236]$ cd fd
[oracle@dbtest fd]$ ls
0 1 10 12 13 14 15 16 2 3 4 5 6 7 8 9
[oracle@dbtest fd]$ ls -ltr
total 3976
-r--r--r-- 1 oracle oinstall 849408 May 10 2008 16
-r--r--r-- 1 oracle oinstall 849408 May 10 2008 12
p--------- 0 oracle oinstall 0 May 16 10:54 9
-rw-rw---- 0 oracle oinstall 0 May 16 14:21 7
--w------- 1 oracle oinstall 858 May 16 14:21 5
--w------- 1 oracle oinstall 157667 May 17 05:00 8
--w------- 1 oracle oinstall 157667 May 17 05:00 6
-rw-rw---- 1 oracle oinstall 1544 May 17 10:19 15
-rw-rw---- 1 oracle oinstall 1544 May 17 10:19 10
c--------- 1 oracle oinstall 2, 2 May 17 10:21 4
c--------- 1 oracle oinstall 2, 2 May 17 10:21 3
c--------- 1 oracle oinstall 2, 2 May 17 10:21 2
c--------- 1 oracle oinstall 2, 2 May 17 10:21 1
c--------- 1 oracle oinstall 2, 2 May 17 10:21 0
-rw-rw-rw- 1 oracle oinstall 0 May 17 10:21 14
-rw-rw-rw- 1 oracle oinstall 0 May 17 10:21 13
如果平台是LINUX平台,那么进入目录fd的时候,你将会看到文件和文件句柄的对应关系,及其那些文件被删除了。
我的平台是AIX的,和LINUX平台不太一样,需要借助lsof命令来查看。
遗憾的是我这边的lsof命令不能用,不能给大家演示。
假设被删除的文件的句柄是10,那么我们可以通过如下命令将文件COPY回去:
cp 10 /test/orcl/orcl/users01.dbf
然后进行一下recover就行了,在没有备份的情况下,如果能及时的把误删文件COPY回去,就能及时挽救我们的数据库。
大家可以参考这篇文章:
lsof aix 使用 lsof 查找打开的文件
如果打开这个文件的进程被关闭了,相应的文件句柄将会释放,此时就再也看不到这个文件了。
我们SHUTDOWN数据库,然后再启动就会报错:
SQL> select * from v$recover_file;
no rows selected
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1610612736 bytes
Fixed Size 2084400 bytes
Variable Size 385876432 bytes
Database Buffers 1207959552 bytes
Redo Buffers 14692352 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/test/orcl/orcl/users01.dbf'
SQL> select * from v$recover_file;
FILE# ONLINE ONLINE_STATUS
---------- -------------- --------------
ERROR CHANGE#
----------------------------------------------------------------- ----------
TIME
-------------------
4 ONLINE ONLINE
FILE NOT FOUND 0
日志文件中有如下错误:
[oracle@dbtest ~]$ tail -f /u01/oracle/admin/orcl/bdump/alert_orcl.log
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed May 16 14:32:04 2012
OS Pid: 516540 executed alter system set events 'immediate trace name heapdump level 10'
Wed May 16 14:58:51 2012
System State dumped to trace file /u01/oracle/admin/orcl/udump/orcl_ora_516540.trc
Thu May 17 05:00:33 2012
Thread 1 advanced to log sequence 18 (LGWR switch)
Current log# 3 seq# 18 mem# 0: /test/orcl/orcl/redo03.log
Thu May 17 10:47:56 2012
Shutting down instance (abort)
License high water mark = 5
Instance terminated by USER, pid = 394304
Thu May 17 10:48:03 2012
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 en1 10.192.14.0 configured from OCR for use as a cluster interconnect
Interface type 1 en0 10.192.39.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =18
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
processes = 150
sessions = 170
__shared_pool_size = 352321536
__large_pool_size = 16777216
__java_pool_size = 16777216
__streams_pool_size = 0
sga_target = 1610612736
control_files = /test/orcl/orcl/control01.ctl, /test/orcl/orcl/control02.ctl,
/test/orcl/orcl/control03.ctl
db_block_size = 8192
__db_cache_size = 1207959552
compatible = 10.2.0.3.0
log_archive_dest_1 = LOCATION=/test/orcl/arch
log_archive_format = %t_%s_%r.arc
db_file_multiblock_read_count= 16
db_recovery_file_dest = /test/orcl/flashback
db_recovery_file_dest_size= 2147483648
undo_management = AUTO
undo_tablespace = UNDOTBS1
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB)
job_queue_processes = 10
background_dump_dest = /u01/oracle/admin/orcl/bdump
user_dump_dest = /u01/oracle/admin/orcl/udump
core_dump_dest = /u01/oracle/admin/orcl/cdump
audit_file_dest = /u01/oracle/admin/orcl/adump
db_name = orcl
open_cursors = 300
pga_aggregate_target = 1646264320
Thu May 17 10:48:04 2012
Oracle instance running with ODM: Veritas 5.0 ODM Library, Version 1.1
cluster interconnect IPC version:
VERITAS IPC '5.0.3.400' 04:49:57 Apr 16 2010
IPC Vendor 86 proto 76
Version 1.0
PMON started with pid=2, OS id=632182
DIAG started with pid=3, OS id=705994
PSP0 started with pid=4, OS id=570926
LMON started with pid=5, OS id=472866
LMD0 started with pid=6, OS id=640160
MMAN started with pid=7, OS id=884854
DBW0 started with pid=8, OS id=741972
DBW1 started with pid=9, OS id=373816
DBW2 started with pid=10, OS id=525912
DBW3 started with pid=11, OS id=557894
LGWR started with pid=12, OS id=395244
CKPT started with pid=13, OS id=366402
SMON started with pid=14, OS id=697382
RECO started with pid=15, OS id=668764
CJQ0 started with pid=16, OS id=415276
MMON started with pid=17, OS id=480782
Thu May 17 10:48:04 2012
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=18, OS id=456092
Thu May 17 10:48:04 2012
starting up 1 shared server(s) ...
Thu May 17 10:48:04 2012
lmon registered with NM - instance id 1 (internal mem no 0)
Thu May 17 10:48:05 2012
Reconfiguration started (old inc 0, new inc 2)
List of nodes:
0
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 0
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Reconfiguration complete
Thu May 17 10:48:05 2012
ALTER DATABASE MOUNT
Thu May 17 10:48:09 2012
Setting recovery target incarnation to 2
Thu May 17 10:48:09 2012
Successful mount of redo thread 1, with mount id 1311324261
Thu May 17 10:48:09 2012
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Thu May 17 10:48:09 2012
ALTER DATABASE OPEN
Thu May 17 10:48:09 2012
Errors in file /u01/oracle/admin/orcl/bdump/orcl_dbw0_741972.trc:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/test/orcl/orcl/users01.dbf'
ORA-17503: ksfdopn:4 Failed to open file /test/orcl/orcl/users01.dbf
ORA-17500: ODM err:File does not exist
ORA-1157 signalled during: ALTER DATABASE OPEN...
此时我们就只能通过备份恢复数据库了。
SQL> ! rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Thu May 17 10:49:10 2012
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: ORCL (DBID=1301704889, not open)
RMAN> restore datafile 4;
Starting restore at 2012-05-17 10:49:14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=149 devtype=DISK
channel ORA_DISK_1: restoring datafile 00004
input datafile copy recid=1 stamp=782931126 filename=/tmp/users01.dbf
destination for restore of datafile 00004: /test/orcl/orcl/users01.dbf
channel ORA_DISK_1: copied datafile copy of datafile 00004
output filename=/test/orcl/orcl/users01.dbf recid=5 stamp=783514156
Finished restore at 2012-05-17 10:49:17
RMAN> recover datafile 4;
Starting recover at 2012-05-17 10:49:20
using channel ORA_DISK_1
starting media recovery
archive log thread 1 sequence 7 is already on disk as file /test/orcl/arch/1_7_773924155.arc
archive log thread 1 sequence 8 is already on disk as file /test/orcl/arch/1_8_773924155.arc
archive log thread 1 sequence 9 is already on disk as file /test/orcl/arch/1_9_773924155.arc
archive log thread 1 sequence 10 is already on disk as file /test/orcl/arch/1_10_773924155.arc
archive log thread 1 sequence 11 is already on disk as file /test/orcl/arch/1_11_773924155.arc
archive log thread 1 sequence 12 is already on disk as file /test/orcl/arch/1_12_773924155.arc
archive log thread 1 sequence 13 is already on disk as file /test/orcl/arch/1_13_773924155.arc
archive log thread 1 sequence 14 is already on disk as file /test/orcl/arch/1_14_773924155.arc
archive log thread 1 sequence 15 is already on disk as file /test/orcl/arch/1_15_773924155.arc
archive log thread 1 sequence 16 is already on disk as file /test/orcl/arch/1_16_773924155.arc
archive log thread 1 sequence 17 is already on disk as file /test/orcl/arch/1_17_773924155.arc
archive log filename=/test/orcl/arch/1_7_773924155.arc thread=1 sequence=7
archive log filename=/test/orcl/arch/1_8_773924155.arc thread=1 sequence=8
archive log filename=/test/orcl/arch/1_9_773924155.arc thread=1 sequence=9
archive log filename=/test/orcl/arch/1_10_773924155.arc thread=1 sequence=10
archive log filename=/test/orcl/arch/1_11_773924155.arc thread=1 sequence=11
archive log filename=/test/orcl/arch/1_12_773924155.arc thread=1 sequence=12
archive log filename=/test/orcl/arch/1_13_773924155.arc thread=1 sequence=13
archive log filename=/test/orcl/arch/1_14_773924155.arc thread=1 sequence=14
archive log filename=/test/orcl/arch/1_15_773924155.arc thread=1 sequence=15
media recovery complete, elapsed time: 00:00:14
Finished recover at 2012-05-17 10:49:37
RMAN> exit
Recovery Manager complete.
SQL> alter database open;
Database altered.
SQL>