Oracle RAC集群等待事件-oracle

一: Oracle RAC 全局等待事件 gc current block busy 和 gc cr multi block request 说明

一．RAC 全局等待事件说明

在RAC环境中，和全局调整缓存相关的最常见的等待事件是global cache cr request，global cache busy和equeue。

当一个进程访问需要一个或者多个块时，会首先检查自己的Cache是否存在该块，如果发现没有，就会先通过global cache赋予这些块共享访问的权限，然后再访问。假如，通过global cache发现这些块已经在另一个实例的Cache里面，那么这些块就会通过Cache Fusion，在节点之间直接传递，同时出现global cache crrequest等待事件。

在10GB中，global cachecr request已经简称为gc crrequest。

从remote cache运输块到本地cache花费的时间还得看这些块是共享还是独占模式，如果块是共享(scur)的，Remote Cache就克隆信息传送过来，否则就要产生一个PI，然后再传送过去。显然，global cache cr request等待事件和db file sequential/scatteredread 等待事件有着直接的关系。

通常，RAC中的进程会等待1s去尝试从本地或者远程Cache读取数据块信息，当然，这还得依靠块处于什么样的模式。如果超过了1s，那就表明节点之间连接慢，这个时候节点之间就使用private连接，而客户端的连接使用public，有时候，节点之间的连接， Cache Fusion就不会通过公共网络，在这种情况下，就会有大量的global cachecr request等待事件出现，可以使用oradebugipc命令去验证下节点之间的连接是否使用了private network。

在下图中，上面的gc [current/cr] [multiblock] request实际上就是placeholder的event，图的左上角也做了说明。

gc [current/cr][multiblock] request实际上是表示了4个事件中的一个（gccurrent request、gc cr request、gc current multiblock request、gc cr multiblock request）。

这里CR和current 是不同的概念，如果是读的话，那就是cr request，如果是更改的话，那就是current request。

Oracle 10g在很多地方区分了multi block request还是single block request，这样容易分析业务的数据特点。当在RAC环境下，一个session请求一个block的时候，就会触发这个事件。

当请求一个block时，如果经过两个或者3个network hop就获得了该块的话，那就会产生gc [current/cr][2/3]-way。如果是3-way，那应该master和holder不是同一个instance，如果是2-way，那就应该master和holder是同一个instance。这应该是最好的情况，请求后，就获得了请求的block即没有busy，也没有说在请求的过程中等待。该类事件应该暗示是进行了block的网络传递，会产生流量，而grant 2-way的网络流量应该相对小。

gc [current/cr]block busy是说虽然也返回了，但是没有immediatesend，也就是控制流程返回了，但是实际的block并没有马上传递到requesterinstance，gc[current/cr] block busy是和gc [current/cr] [2/3]-way对应的。

gc [current/cr]grant 2-way当请求一个block时，接收了一个message，该message应该是赋予了requester instance可以访问这个block。如果这个block没有在local cache中，则随后的动作就是去磁盘上读该block。（插一点别的，Oracle的对数据的访问的控制，是在row级别和object级别，但是实际操作的对象却是block，传递的对象也是block，对于一个block来说，会有一个master instance，也就是这个block的管理者，然后还有零到多个参与者，比如有的instance为了读一致性，可能会在自己的local cache中存着该block的过去某个时间的image，有的instace为了修改该block，可能会在自己的local cache中存着该block的past image）。

gc current grantbusy当一个instance请求一个block时，被告诉是busy的。不明白在什么情况下会产生grant busy的事件。

gc [current/cr][block/grant] congested对这几个事件的理解是无论对于current还是cr类型的block或者grant，都获得了事件，但是在过程中有拥堵。也就是在内部的队列中等待超过1 ns（纳秒）。

gc [current/cr][failure/retry]这就是发生错误了，没有请求到block.

gc buffer busy是多个进程在同时访问一个block，造成锁竞争了。用RAC就一定要将各个节点隔离化，不管是通过业务隔离，区域隔离还是什么其他隔离手段，最终的目的，就是要各个节点所承担的业务，访问不同的数据对象，最大可能地减少节点间的资源争用，才能发挥RAC集群系统的最大性能。

当会话从开始提交一致读的请求，到它获取请求信息，这个过程它是SLEEP状态的，对用户而言，看到的就是global cache cr request等待事件，而wait time就是记录这个过程的时间。

通常，大量的global cache cr request主要有以下几个原因。

（1）节点之间内部连接慢或者节点之间传输带宽窄。这个可以通过重新连接获取高速连接。

（2）存在热点数据块的竞争。

（3）CPU负载过高或者LMS后台进程不够。正常情况下，只有两个LMS后台进程从CPU那里获取资源，增加LMS进程的数量或者提高它的优先权能够帮助从CPU那里获取更多的资源。隐藏参数_lm_lms是设置LMS进程数量的。

（4）大量未提交的事务或者系统磁盘设备传输慢。

有关global cache的信息：

SQL> select name,value from v$sysstat where name like '%global cache%';

二．gc current/crblock busy等待事件

2.1 gc current block busy 等待事件

When a requestneeds a block in current mode, it sends arequest to the master instance. The requestor evenutally gets the blockvia cache fusion transfer. However sometimes the block transfer isdelayed due to either the block was being used by a session on another instanceor the block transfer was delayed because the holding instance could not writethe corresponding redo records to the online logfile immediately.

--当请求的block是current模式，会发送一个请求到master 实例，最终请求者通过cache fusion获取到这个block。但是有时block在transfer过程中会有延时，比如这个block正在被其他的block使用，或者持有block的实例不能及时的将redo records写入online logfile。

One can use thesession level dynamic performance views v$session and v$session_event to findthe programs or sesions causing the most waits on this events

SQL>selecta.sid , a.time_waited , b.program , b.module from v$session_event a , v$sessionb where a.sid=b.sid and a.event='gc current block busy' order by a.time_waited;

2.2 gc cr block busy 等待事件

When a requestneeds a block in CR mode, it sends arequest to the master instance. The requestor evenutally gets the block viacache fusion transfer. However sometimes the block transfer is delayed due toeither the block was being used by a session on another instance or the blocktransfer was delayed because the holding instance could not write thecorresponding redo records to the online logfile immediately.

One can use thesession level dynamic performance views v$session and v$session_event to find theprograms or sesions causing the most waits on this events

SQL>selecta.sid , a.time_waited , b.program , b.module from v$session_event a ,v$session b where a.sid=b.sid and a.event='gc cr block busy' order bya.time_waited;

2.3 相关说明

gc current blockbusy 等待是RAC中global cache全局缓存当前块的争用等待事件，该等待事件时长由三个部分组成：

Time to process current block request inthe cache= (pin time + flush time + send time)

gc current block flush time

The currentblock flush time is part of the service (or processing) time for a currentblock. The pending redo needs to be flushed to the log file by LGWR before LMSsends it. The operation is asynchronous in that LMS queues the request, postsLGWR, and continues processing. The LMS would check its log flush queue forcompletions and then send the block, or go to sleep and be posted by LGWR. Theredo log write time and redo log sync time can influence theoverall service time significantly.

flush time 是为了保证Instance Recovery实例恢复机制，而要求每一个current block在本地节点local instance被修改后(modify/update) 必须要将该current block相关的redo 写入到logfile 后（要求LGWR必须完成写入后才能返回)，才能由LMS进程传输给其他节点使用。

而gc buffer busy acquire/release 往往是 gc current block busy的衍生产品，当同一实例内的多个进程并发地访问同一个数据块时，首先发起的进程将进入 gc current block busy的等待，而在 buffer waiter list 上的后续进程会陷入gc buffer busy acquire/release 等待(A user on the same instance has started a remote operation on thesame resource and the request has not completed yet or the block was requestedby another node and the block has not been released by the local instance whenthe new local access was made)，这里存在一个排队效应，即 gc current block busy是缓慢的，那么在排队的gc buffer busy acquire/release就会更慢：

Pin time = (timeto read the block into cache) + (time to modify/process the buffer)

Busy time =(average pin time) * (number of interested users waiting ahead of me)

不局限于current block （reference AWR Avg global cache current block flush time(ms)), cr block(Avg global cache cr block flush time (ms)) 也存在flush time。

可以通过设置_cr_server_log_flush to false(LMSare/is waiting for LGWR to flush the pending redo during CR fabrication.Without going too much in to details, you can turn off the behaviourby setting _cr_server_log_flush to false.) 来禁止crserver flush redo log，但是该参数对于current block的flush time无效，也强烈不推荐使用。

三．gc cr multi blockrequest等待事件

gc cr multiblock request实际就是globalcache cr multi block request，10G以后global cache被简称为gc,在RAC应用系统里面，这是一个常见的等待事件。

multi block一般情况下都是全表扫描或全索引扫描导致， gc cr multiblock request 会造成CPU 对内存的调度和管理，会消耗CPU 时间。

gc cr multiblock request 问题应在rac 层面上进行应用分离，即不同节点处理不同应用，节点之间通过配置，做为彼此的备用节点，在节点宕机时可以结果相关应用，提供高可用性。

RAC性能分析 - gc buffer busy acquire 等待事件

概述
---------------------
gc buffer busy是RAC数据库中常见的等待事件，11g开始gc buffer busy分为gc buffer busy acquire和gc buffer busy release。

gc buffer busy acquire是当session#1尝试请求访问远程实例(remote instance) buffer，但是在session#1之前已经有相同实例上另外一个session#2请求访问了相同的buffer，并且没有完成，那么session#1等待gc buffer busy acquire。

gc buffer busy release是在session#1之前已经有远程实例的session#2请求访问了相同的buffer，并且没有完成，那么session#1等待gc buffer busy release。

原因/解决方法
---------------------
- 热点块(hot block)
在AWR中Segments by Global Cache Buffer Busy 记录了访问频繁的gc buffer.
解决方法可以根据热点块的类型采取不同的解决方法，比如采取分区表，分区索引，反向index等等。这点与单机数据库中的buffer busy waits类似。

- 低效SQL语句
低效SQL语句会导致不必要的buffer被请求访问，增加了buffer busy的机会。在AWR中可以找到TOP SQL。解决方法可以优化SQL语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。

- 数据交叉访问。
RAC数据库，同一数据在不同数据库实例上被请求访问。
如果应用程序可以实现，那么我们建议不同的应用功能/模块数据分布在不同的数据库实例上被访问，避免同一数据被多个实例交叉访问，可以减少buffer的争用，避免gc等待。

- Oracle bug
建议安装Oracle推荐的最新Patch Set和PSU。
Patch set和PSU信息请参考：Oracle Recommended Patches -- Oracle Database (Doc ID 756671.1)

案例分享
---------------------
一个gc buffer busy acquire的案例，和大家分享一下。

- 应用端反映业务处理异常，数据库hang，在第一时间现场DBA收集了hanganalyze (hanganalyze对于分析数据库hang非常重要)

RAC数据库收集hanganalyze的命令：
SQL> conn / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug -g all hanganalyze 3

通过hanganalyze我们可以比较容易看到有1000个以上的Chain都有类似的等待关系，比如：

Chain 1 Signature: 'gc current request'<='gc buffer busy acquire'<='enq: TX - contention'
Chain 2 Signature: 'gc current request'<='gc buffer busy acquire'<='buffer busy waits'
…
Chain 1243 Signature: 'gc current request'<='gc buffer busy acquire'<='enq: TA - contention'
Chain 1244 Signature: 'gc current request'<='gc buffer busy acquire'<='enq: TA - contention'

Hanganalyze说明数据库中大部分session直接或者间接等待'gc current request'<='gc buffer busy acquire'。

- 有些情况下dia0 trace文件也会记录hang信息

inst# SessId Ser# OSPID PrcNm Event

----- ------ ----- --------- ----- -----

1 1152 3 21364904 FG gc buffer busy acquire

1 2481 3 26607642 FG gc current request

Chain 1 Signature: 'gc current request'<='gc buffer busy acquire'

Chain 1 Signature Hash: 0x8823aa2a

- 有些情况下dba_hist_active_sess_history也会记录hang信息。

1. 在数据库hang的时间段内，有691个session在等待'enq: TA - contention'，'enq: TA - contention'的持有者是session#931，serial#39657

2. session#931，serial#39657 也是处于等待状态，等待事件是'gc buffer busy acquire'，而'gc buffer busy
acquire'的持有者是session#1324，serial#22503

3. session#1324，serial#22503 也是处于等待状态，等待事件是'gc current request'

通过分析dba_hist_active_sess_history，也可以得到session等待关系:
'gc current request'<='gc buffer busy acquire'<='enq: TA - contention'
这个等待关系与hanganalyze是一致的。

- 根据以上分析得到session等待关系，可以确定数据库hang的原因是oracle已知问题Bug
13787307 - Hang in RAC with 'gc current request'<='gc buffer busy acquire' signature.

- 解决方法：
安装Patch 13787307 或者设置_gc_bypass_readers=false临时规避这个问题。
另外，在11.2低版本中也有些类似的已知问题，建议安装最新patch set (11.2.0.3/4) + 最新PSU 。
Patch set和PSU信息请参考：Oracle Recommended Patches -- Oracle Database (Doc ID 756671.1)

【Oracle RAC调优】RAC多节点使用不同的gcs_server_processes参数可能导致gc cr multi block request等待事件; 例如一个RAC实例的 gcs_server_processes=5 而另一个实例为gcs_server_processes=4 就可能引起额外的gc cr multi block request等待事件，该问题在RAC 11.2.0.3 中仍存在。建议是使用相同的gcs_server_processes。

遇到该gc cr multi block request等待事件，在参数方面建议检查：

show parameter gcs_server_process

show parameter cpu_count

show parameter db_file_multiblock_read_count

OS参数：udp_recvspace 、 udp_sendspace

NUM_CPUS NUM_CPU_CORE

Yes, it is supported/allowed to set different GCS_SERVER_PROCESSES for RAC instances.
This is confirmed in the Online Doc of both 10.2 and 11.2:
10.2:

Real Application Clusters: Multiple instances can have different values.

11.2:

Oracle RAC Multiple instances can have different values.

There is a Bug 11693109 which can lead to ORA-481 during DRM when there is a rather unbalanced LMS set up.
This Bug has been fixed by 11.2.0.3.

2. For RAC database, we recommend to leave oracle to determine the initial value of GCS_SERVER_PROCESSES when all your CPUs are in place. However if you have encountered serious GC waits which is caused by lack of LMSs, you may need to increase the number of LMS process. The number of database instances running on a given cluster node should be limited such that the total number of real-time LMS processes on a node does not exceed the number of #CPUs(core) –

GCS_SERVER_PROCESSES

LMS Real Time Priority in RAC 10g Release 2 – Things to Consider Before Changing (Doc ID 433105.1)
Version: 10.2.0.1 to 11.1.0.6
The rule of thumb is that you should keep the number of real time LMS processes to be at least 1 less than the number of CPUs (cores) in the system.

RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent) (Doc ID 810394.1)
Version 10.2.0.1 to 11.2.0.3
The number of database instances running on a given cluster node should be limited such that the total number of real-time LMS processes on a node
does not exceed the recommended number of #CPUs – 1.

10gR2 RAC Reference (INTERNAL ONLY) (Doc ID 341965.1)
the number of LMS processes can be set with the parameter GCS_SERVER_PROCESSES, and the value is dependent on the number of CPUs. At startup time, we start (CPU_COUNT / 4) LMS processes, but there are always at least two LMS processes.

Auto-Adjustment of LMS Process Priority in Oracle RAC with 11.2.0.3 and later (Doc ID 1392248.1)
For 11.2.0.3 Oracle RAC database instances, the number of LMS processes that will be started can be calculated using the following formula:
n = “number of CPUs reported by the OS and used for CPU_COUNT per default”
n < 4 => 1 LMS process will be started
4 <= n < 16 => 2 LMS processes will be started
n >=16 => 2 LMS + 1 LMS processes for every 32 CPU will be started.

Keywords:lms unbalanced

Bug 11693109 – ORA-481 during DRM results in LMS communication issues between nodes / hang (Doc ID 11693109.8)
ORA-481 during DRM, which results in LMS communication issues between nodes.
This problem can occur if there is a rather unbalanced LMS set up
eg: where there is 4 LMSn on Node1 and Node6 and 16 LMSn on Node2/Node3/Node4/Node5
<==Already fixed by 11.2.0.3

IBM POWER7 AIX and Oracle Database performance considerations — 10g & 11g (Doc ID 1507249.1)

These values are generally suggested for Oracle, and can be considered as starting points:

· sb_max >= 1MB (1048576) and must be greater than maximum tpc or udp send or recvspace (if you are using RAC and very large udp_recvspace, you might need to increase sb_max)

· tcp_sendspace = 262144

· tcp_recvspace = 262144

· udp_sendspace = db_block_size * db_file_multiblock_read_count

· udp_recvspace= 10 * (udp_sendspace)

· rfc1323 = 1 (see Recent suggestions and open issues)

· Ephemerals (non-defaults suggested for a large number of connecting hosts or a high degree of parallel query; also to avoid install-time warnings)

· tcp_ephemeral_low=9000

· tcp_ephemeral_high=65500

· udp_ephemeral_low=9000

· udp_ephemeral_high=65500

——————————————————-

How many database in the node and how many cpus in the node?

Ct may increase gcs_server_processes if possible
refer to Doc ID 558185.1 for general guideline
LMS and Real Time Priority in Oracle RAC 10g and 11g (Doc ID 558185.1)

Oracle Database (RDBMS) on Unix AIX,HP-UX,Linux,Mac OS X,Solaris,Tru64 Unix Operating Systems Installation and Configuration Requirements
Quick Reference (8.0.5 to 11.2) (Doc ID 169706.1)

1.Please run the below output.

lsattr -E -l sys0 no -a
show parameter db_file

2.Refer the below doc Doc ID 169706.1
From Section 2.8.2 of the 11.2 Install Guide for AIX:

3.Please upload awr /ash report after you reset db_file_multiblock_read_count=8
4.

$ netstat -s |grep overflow
0 socket buffer overflows <=======Not found
0 ipintrq overflows

1) gc cr multi block request is an expected latency when your db file multiblock read count parameter is set to too high and not 8 or 16….reduce it

2) gc cr multi block request is an expected latency when your UDP related kernel parameters like udp_sendspace and udp_receivespace is not set to as per recommended value of 65536

3) gc cr multi block request is an expected latency when your LMS processes are not running in real time high priority:-
LMS and Real Time Priority in Oracle RAC 10g and 11g (Doc ID 558185.1)
Auto-Adjustment of LMS Process Priority in Oracle RAC with 11.2.0.3 and later (Doc ID 1392248.1)
NOTE:840982.1 – How to check lms process is in Real Time mode or not at Linux