使用TAF(通明应用切换)
RAC =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip.oracle.net)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac2-vip.oracle.net)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl)
(FAILOVER_MODE =(TYPE = SESSION)(METHOD = BASIC))
)
)
配置说明:
①FAILOVER_MODE部分中的TYPE主要有两个选项:SESSION和SELECT,区别是:当前实例故障时,SESSION选项会在另一个实例上会重新建立连接;SELECT选项不但在另一个实例上重新建立会话,而且如果之前正在执行行select操作,则select操作继续运行。选择SELECT选项明显会消耗更多资源,同时SELECT选项会导致杀select会话永远杀不掉。
②FAILOVER_MODE部分中的METHOD有两个选项:BASIC和PRECONNECT,区别是:BASIC选项是在感知到当前实例故障时才创建到其他实例的连接;PRECONNECT是在最初建立连接时就同时建立到所有实例的连接,感知到当前节点故障时,立刻就可以切换到其他实例。当故障发生时PRECONNECT比BASIC切换速度快,但平时会消耗更多的内存资源。
其他没写出来的选项使用了默认值,如:
①LOAD_BALANCE的默认值是off,即关闭LOAD BALANCE,连接时按地址列表顺序尝试,如果第一个地址能连上,则不会去连第二个。如果开启LOAD BALANCE,则每个实例的连接数基本相等。
②FAILOVER的默认值是on,即开启了FAILOVER功能。
③FAILOVER_MODE中RETRIES和DELAY,默认RETRIES为5次,每次DELAY1秒钟,切换。
这种配置方式需要注意的是:failover发生后,当前会话切到了其他实例上,此时需要执行rollback,否则一直报ORA-25405: transaction status unknown。也就是应用程序需要处理ORA-25405,一旦发现此错误,立即执行rollback。
TAF failover及杀会话、杀进程测试过程
--模拟一客户端连接到RAC
[oracle@logdb ~]$ sqlplus
SQL*Plus: Release 11.2.0.1.0 Production on Wed May 15 09:30:39 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
--查看当前会话连接的实例
SQL> select a.instance_name,b.failover_type,b.failover_method,b.failed_over from v$instance a,v$session b where sid=(select sid from v$mystat where rownum=1);
INSTANCE_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ------------- ---------- ---
orcl1 SESSION BASIC NO
SQL> select * from t;
ID
----------
1
--产生事务
SQL> insert into t values (2);
1 row created.
--模拟实例orcl1故障
[oracle@rac1 ~]$ srvctl stop instance -d orcl -i orcl1 -o immediate
--当前会话继续进行操作,除执行rollback外,都会报ORA-25402: transaction must roll back
SQL> select * from t;
select * from t
*
ERROR at line 1:
ORA-25402: transaction must roll back
SQL> commit;
commit
*
ERROR at line 1:
ORA-25405: transaction status unknown
--该会话必须处理ORA-25405,处理方式就是rollback,否则无法无法进行任何操作。
SQL> rollback;
Rollback complete.
--rollback后一切正常了
SQL> select * from t;
ID
----------
1
--由于此时实例orcl1已关掉,所以当前会话已连到了orcl2上
SQL> select a.instance_name,b.failover_type,b.failover_method,b.failed_over from v$instance a,v$session b where sid=(select sid from v$mystat where rownum=1);
INSTANCE_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ------------- ---------- ---
orcl2 SESSION BASIC YES
failover测试总结:failover发生后,当前会话可以切到其他实例上,但需要执行rollback,否则一直报ORA-25405: transaction status unknown。
下面进行杀会话杀进程测试:
--启动实例orcl1
[oracle@rac1 ~]$ srvctl start instance -d orcl -i orcl1
--当前会话并不会因为实例orcl1起来了,就回到原来的实例,仍然在实例orcl2上。
SQL> select a.instance_name,b.failover_type,b.failover_method,b.failed_over from v$instance a,v$session b where sid=(select sid from v$mystat where rownum=1);
INSTANCE_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ------------- ---------- ---
orcl2 SESSION BASIC YES
--查当前会话的sid及serial#
SQL> select 'alter system kill session '''||sid||','||serial#||''';' from v$session where sid = (select sid from v$mystat where rownum=1);
'ALTERSYSTEMKILLSESSION'''||SID||','||SERIAL#||''';'
--------------------------------------------------------------------------------
alter system kill session '2211,7';
--当前会话执行一条长时间运行的问题sql
SQL> select count(*) from dba_objects,dba_segments,dba_tables;
--到实例orcl2上杀掉当前会话
SQL> select instance_name,status from v$instance;
INSTANCE_NAME STATUS
---------------- ------------
orcl2 OPEN
SQL> alter system kill session '2211,7';
System altered.
--当前会话被杀后,报了一个错误,然后一切正常
SQL> select count(*) from dba_objects,dba_segments,dba_tables;
select count(*) from dba_objects,dba_segments,dba_tables
*
ERROR at line 1:
ORA-25408: can not safely replay call
SQL> select * from t;
ID
----------
1
--同时发现当前会话从实例orcl2上连到了orcl1上
SQL> select a.instance_name,b.failover_type,b.failover_method,b.failed_over from v$instance a,v$session b where sid=(select sid from v$mystat where rownum=1);
INSTANCE_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ------------- ---------- ---
orcl1 SESSION BASIC YES
--多次测试发现:只有发生过failover的会话,被杀掉后,发生实例切换,回到原来的实例上。这个应该与tnsnames配置有关(LOAD_BALANCE=off),重连时按地址列表顺序选择。
--杀进程测试
查当前会话的进程号
SQL> select 'kill -9 '||p.spid from v$session s,v$process p where s.paddr = p.addr and s.sid = (select sid from v$mystat where rownum=1);
'KILL-9'||P.SPID
--------------------------------
kill -9 20017
--当前会话执行一条长时间运行的问题sql
SQL> select count(*) from dba_objects,dba_segments,dba_tables;
--然后杀当前会话进程
[oracle@rac1 ~]$ kill -9 20017
--当前会话进程被杀掉后,正在执行的sql报错了,然后一切正常
SQL> select count(*) from dba_objects,dba_segments,dba_tables;
select count(*) from dba_objects,dba_segments,dba_tables
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 20438
Session ID: 98 Serial number: 17
--杀进程后,并没有导致当前会话切换到其他实例上
SQL> select a.instance_name,b.failover_type,b.failover_method,b.failed_over from v$instance a,v$session b where sid=(select sid from v$mystat where rownum=1);
INSTANCE_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ------------- ---------- ---
orcl1 SESSION BASIC YES
--继续进行杀会话测试
--查当前会话连接到了哪个实例
SQL> select a.instance_name,b.failover_type,b.failover_method,b.failed_over from v$instance a,v$session b where sid=(select sid from v$mystat where rownum=1);
INSTANCE_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ------------- ---------- ---
orcl1 SESSION BASIC YES
--查当前会话的sid和serial#
SQL> select 'alter system kill session '''||sid||','||serial#||''';' from v$session where sid = (select sid from v$mystat where rownum=1);
'ALTERSYSTEMKILLSESSION'''||SID||','||SERIAL#||''';'
--------------------------------------------------------------------------------
alter system kill session '2929,29';
--执行问题sql
SQL> select count(*) from dba_objects,dba_segments,dba_tables;
--到实例orcl1上杀会话
SQL> select instance_name,status from v$instance;
INSTANCE_NAME STATUS
---------------- ------------
orcl1 OPEN
SQL> alter system kill session '2929,29';
System altered.
--当前会话进程被杀掉后,正在执行的sql报错了,然后一切正常
SQL> select count(*) from dba_objects,dba_segments,dba_tables;
select count(*) from dba_objects,dba_segments,dba_tables
*
ERROR at line 1:
ORA-25408: can not safely replay call
--这次杀会话,也没有导致把会话切换到其他实例上。
SQL> select a.instance_name,b.failover_type,b.failover_method,b.failed_over from v$instance a,v$session b where sid=(select sid from v$mystat where rownum=1);
INSTANCE_NAME FAILOVER_TYPE FAILOVER_M FAI
---------------- ------------- ---------- ---
orcl1 SESSION BASIC YES
杀会话和杀进程总结:
如果当前会话发生过failover,杀会话和杀进程会导致当前会话切到原来的实例上;
如果当前会话没有发生过failover,杀会话和杀进程不会导致当前会话切到其他实例上;
杀会话和杀进程发生后无需额外的处理工作,这跟在单实例是一样的。