全部博文(60)
分类: Oracle
2011-02-14 21:18:35
进几年来,你可能已经听说过:Oracle认为你可以使用RAC来获得高可用性、容错、硬件投资带来的优异回报等优点,并且RAC的使用也在不断增长,作为一名DBA,你可能已经支持RAC环境了。这一章将会介绍RAC环境中的Oracle等待事件,并且介绍如何确定并解决RAC中特有的等待事件。我们将主要关注global cache waits,因为这些等待事件影响整个Cluster数据库。
8.1、那么,RAC中的等待事件有哪些特殊之处呢?在我们讨论RAC环境中的等待事件之前,你需要明白RAC环境中的buffer cache是如何工作的。
在单实例环境中,只有一组用于数据库的共享内存段。所有的I/O操作、SQL处理及library cache 操作都是在这组共享的内存段中完成的。换句话说,buffer cache和shared pool是在本地的特定实例上的。和特定实例相关的进程在任何一个是节点都只访问一组内存结构。
然而,在RAC环境中,情况则完全不同。多个实例共享同一个数据库。这些实例一般是运行在不同的服务器(或节点)上。Buffer cache是分布在多个实例上的,每个实例都有自己的buffer cache,也就是众所周知的SGA结构啦^_*。一个buffer(也称为block)可能在一个实例的buffer cache中,也可能在其它节点上的实例的buffer cache中。特定机器的本地进程将会访问这些 buffer cache以读取其中的内容。不同于本地的前台进程,远程机器的后台进程访问本地的buffer cache。远程实例的锁管理服务(即Lock Manager Service:LMS)进程将访问global buffer cache,而DBWR进程将访问本地的 buffer cache。
因为buffer cache是全局的,并且分布到多个实例上,所以与buffer cache和shared pool相关的管理操作也会与单实例的Oracle环境有所不同,等待事件也是如此。
注意:RAC需要共享磁盘系统以存储数据文件。这个通常使用裸设备或者是群集文件系统就可以达到目的了。这两种方法都允许多个节点同时访问数据文件。
注意:在我们的讨论中,为了易于理解,block和buffer这两个术语用在相似的上下文环境中。Block是存储在磁盘上的并且可以加载到任何buffer cache中的一个buffer上。Oracle总是访问buffer cache中的block的。如果已经将block加载到任意一个实例的buffer cache中,并且可以传输而不需要在持有block的节点上做更多的工作的话,那么它就可以传输到其它实例的buffer cache中。否则,将block从磁盘读入到buffer中。
RAC中的global buffer cache一般地,data buffer cache是由多个实例共享的,这种buffer cache称之为global cache。每个实例都有自己的buffer cache,所有的buffer cache一起就组成了global cache。
因为cache是全局的,因此一致性读(CR)的处理就不同于单实例环境中的情况了。在单实例环境中,当一个进程需要修改或读取一个block时,它将block从磁盘读入到内存中,然后pin住buffer并修改它。在RAC环境中,当一个进程需要一个block时,它也可以从磁盘读入block并在buffer中修改它。因为buffer cache是全局的,因此这个block可能由其它进程读取了并放在其中的一个buffer cache中了。在这种情形下,从磁盘读数据可能会导致数据错误。接下了的部分解释Oracle怎样避免数据错误。
1、并行cache管理(Parallel Cache Management)在OPS中(Oracle Parallel Server,Oracle并行服务器),global buffer 管理操作称之为并行cache管理(PCM),buffer 锁(也成为PCM锁)用于保护cache中的buffer。其实,PCM锁就是一个数据结构,这个数据结构通常包含一组块。在init.ora中使用参数GC_FILES_TO_LOCKS来配置PCM锁数量。关于PCM锁的详细讨论超出了本章的范围。在这里,我们将讨论RAC环境中的cache管理和相关的等待事件。
锁控制与资源相似性(Lock Mastering and Resource Affinity)
在正常的群集环境中,资源分割到不同的节点上,这些节点一般是参与到群集中的不同服务器。每个节点都称为一个控制节点,拥有全部资源的一个子集;不同节点分别处理这些资源的控制(原文:the control of those resources is handled from the respective nodes)。这种进程就是通常所说的资源控制或锁控制(resource mastering or lock mastering).如果一个节点想要获得一个资源,而刚好这个资源被其它节点控制着,则发起请求的节点就需要向控制资源的节点(或者是持有资源的节点)请求授予访问权限。通过这种方法,一个资源在同一时间就只有一个节点来处理,这样就避免了群集环境中的数据错误。
在Oracle的RAC实现中,如果一个节点(请求节点)所请求的资源由其它节点(控制节点)控制着,那么请求节点就向控制节点上的GCS(Global Cache Service)发起请求,由GCS授予访问权限。
在RAC环境中,GRD(Global Resource Directory:全局资源目录)由GES(Global Enqueue Service)维护,而GCS处理“锁控制(lock mastering)”操作。它们根据资源相似性动态地重新控制(Remaster)资源,这样可以增加性能,因为这些资源的所有权已经被本地化了。例如,如果某个实例比其它实例更频繁地使用某个资源,则这个资源将被动态重新控制到这个实例,即:将这个资源的控制权将交给该实例。这有助于根据特定的实例控制资源,并且通过减少锁管理操作来提升群集的性能。这个新特性就叫做动态重新控制(dynamic remasting)。动态重新控制(dynamic remastering)受初始化参数_LM_DYNAMIC_REMASTERING的控制,设置_LM_DYNAMIC_REMASTERING=FALSE可以禁用动态重新控制(dynamic remastering)。
在buffer cache中,可以将buffer读入到下列模式中;可以在V$BH视图中查看到buffer的状态,下面列出buffer cache中的buffer的几种状态:
FREE – 当前没有使用 (not currently in use)XCUR – 独占(exclusive current )SCUR – 共享(shared current )CR – 一致性读(consistent read )READ – 正在从磁盘上读取(being read from disk )MREC – 处于介质恢复中(in media recovery mode )IREC – 处于实例恢复中(in instance recovery mode )WRI- 写克隆模式(Write Clone Mode)PI- 过去映像(Past Image) 2、一致性读处理在buffer cache中的buffer可以是上述状态中的任何一种状态。我们在这里讨论的将只关注XCUR、SCUR及CR这几种状态。在buffer cache上的任何SELECT操作在语句执行期间都需要处于SCUR模式的buffer。DML命令需要处于XCUR(也叫做current mode buffer)模式的buffer,并且进行改变的进程(在这种情况下是DML运算符)需要独占地拥有buffer。在这期间,任何其它进程需要访问这些块,那么这些进程将克隆buffer cache中的buffer并使用克隆的那个副本进行处理,这个副本叫做CR副本。执行克隆操作的进程将在V$SYSSTAT增加consistent gets的统计值。
注意:buffer 克隆(buffer cloning)是一个进程,通过使用称为Pre Image或Past Image(简写为PI)的undo vector,该进程从buffer内容保持一致性的位置建立buffer副本,或将buffer副本建立到这个位置上。
在buffer克隆期间,可能会因为buffer中的内容经常变动而无限地克隆buffer,从而限制buffer cache中剩余buffer cache的利用率。为了避免buffer cache中充满buffer的副本,Oracle限制buffer副本的数量:每个DBA(Data Block Address)最多只有6个buffer副本。一旦达到这个限制数,Oracle就等待buffer(Oracle waits for the buffer);正常情况下,在重新试图克隆/读取buffer之前会等待10毫秒。每个DBA的副本数量受初始化参数_DB_BLOCK_MAX_CR_DBA的控制,这个参数的默认值是6。在管理操作中具有更多的CR副本不会影响到buffer cache,因为没有对这些CR副本使用普通的MFU(Most Frequently Used)算法。CR副本始终是放在buffer cache 的cold端,除非将参数_DB_AGING_FREEZE_CR设置为FALSE,或者是除非配置了一个 recycle cache。注意,在buffer cache的cold端的CR buffer在任何时间都可以被刷新出去。
Oracle 数据库buffer cache 算法不再是基于LRU/MRU的了。新的算法是基于buffer的访问频繁程度。访问最频繁的buffer总是保存在buffer cache的hot端,最近访问的buffer(也叫做MRU Buffer)放在buffer cache的中间位置(Oracle早期的版本是放在hot端)。这称为中间点插入。在新的MFU算法中,基于使用频率来加热buffer,并且缓慢的移动到hot端;如果一段时间内没有访问(或touched)buffer,那么buffer的温度将降低。每个buffer结构都有一个计数器叫做touch count。Buffer就是基于这个计数器来决定是移动到hot区还是cold区。如果在默认的3秒(受_DB_AGING_TOUCH_TIME参数的控制,默认值是3秒)内没有访问buffer,那么buffer的计数器值就会减半。 CR buffer放置在buffer cache的cold端。
我们已经讨论了buffer cache的基本概念与单实例数据库的CR处理,接下来我们将讨论RAC环境中的CR处理。如同你在前面看到的,RAC环境中有两个或多个实例管理的buffer cache。我们将回顾在pre-RAC buffer传输是怎样发生的,然后回顾如何在RAC环境中改变buffer。(原文:We will review how the buffer transfer occurred in the pre-RAC days and then review how it is changed in the RAC environment.)
4、Pings and False PingsIn OPS (Oracle Parallel Server), whenever a process (belonging to one instance) wants to read a resource/buffer (we’ll call it a resource for simplicity), it acquires a lock on the resource and reads it into its own buffer cache. The Distributed Lock Manager (DLM) structures keep track of the resources and owners in their own lock structures. In this scenario, if the resource is acquired and used by the other instance, Oracle sends a request to the other instance to release the lock on that resource. For example, instance A wants a block that was used in instance B’s cache. To perform the transfer, instance B would write the block to disk and instance A would read it. The writing of a block to disk upon request of another instance is called a ping.
There is another type of disk write that happens if the block is written to disk by one instance because another instance requires the same lock that covers different blocks. This is called a false ping. Once the holder downgrades the lock and writes the buffer to the disk, the requester can acquire the lock and read the block into its own buffer cache. For example, instance A wants a block that shares the same PCM lock as a block in the cache of instance B. Having additional PCM locks configured will greatly reduce the false pings, but it would be too resource-intensive to cover every block by a lock unless it was a very small database. A single READ request can require multiple write operations as well the read, and the disk will be used as a data transfer media. True pings and false pings put a heavy load on the I/O subsystem, and affect the scalability of the system if the application is not partitioned correctly based on the workload characteristics. This is one of the strong reasons for workload partitioning in Oracle Parallel Server environments.
In addition, it puts a huge administrative overhead on allocating PCM locks for each database file based on the frequency/concurrency the administrator uses to configure the fixed, releasable, and hash locks. Improper configuration of lock objects causes the excessive false pings, and the systems supposedly designed for scalability never scales to the required level.
The DLM keeps track of the ownership of the blocks (attributes), such as which instance holds which blocks in shared and exclusive mode. At any point of time, only one instance can hold a block in an exclusive mode, and more than one instances can hold that block in a shared mode. During lock downgrade (or ping), the holder writes the changes to the redo log, flushes the redo to the disk, and downgrades the lock from exclusive mode to null /shared mode. The requestor can acquire the lock in required mode and read the block into its own buffer cache.
Cache FusionStarting from Oracle8i Database, the CR server processing was simplified and the new background process, Block Server Process (BSP), was introduced to handle the CR processing. In this case, when a requestor pings the holder for the CR copy, the CR copy is constructed from the holder’s undo information, and is shipped to requestor’s buffer cache via the interconnect (high speed and dedicated). The disk is not used as a data transfer medium. The interconnect is used to fuse the buffer across the caches; this data transfer method is called Cache Fusion. The BSP handles the cache transfer between instances.
Starting Oracle9i Database, the Global Cache Service (GCS) handles the Cache Fusion traffic. The current buffer (XCUR) can also be transferred through a network connection, which is usually a dedicated fast interconnect. The internals of the current mode transfer are very complex and beyond the scope of the discussion, but one interesting thing to note is that Oracle limits the number of CR copies per DBA that can be created and shipped to the other instance through the network.
There is a fairness counter kept in every CR buffer, and the holder increments the counter after it makes a CR copy and sends it to the requestor. Once the holder reaches the threshold defined by the parameter _FAIRNESS_THRESHOLD, it stops making more CR copies, flushes the redo to the disk, and downgrades the locks.
From here onward, the requestor has to read the block from the disk after acquiring the lock for the buffer from the lockmaster. The _FAIRNESS_THRESHOLD parameter value defaults to 4, which means up to 4 consistent version buffers are shipped to the other instance cache, and thereafter the CR server stops responding to the CR request for that particular DBA. The view V$CR_BLOCK_SERVER has the details about the requests and FAIRNESS_DOWN_CONVERTS details.
V$CR_BLOCK_SERVERThe column CR_REQUESTS contains the number of requests received for a particular block at a specific version or a specific SCN. Any request that involves SCN verification is called consistent get. The total number of requests handled by the LMS process will be equal to the sum of CR_REQUESTS and CURRENT_REQUESTS. The total number of requests will be split into DATA_REQUESTS, UNDO_REQUESTS, and TX_REQUESTS (undo header block) requests.
select cr_requests cr,In some cases, constructing CR copy may be too much work for the holder. This may include reading data and undo blocks from disk or from some other instance’s cache which is too CPU intensive. In this case the holder simply sends the incomplete CR copy to the requester’s cache and the requester will create a CR copy either by block clean out which may include reading several undo blocks and data blocks. This operation is known as the light work rule, and the LIGHT_WORKS column indicates the number of times the light work rule is applied for constructing the CR blocks.
The number of times light work rule is applied will be visible in the view V$CR_BLOCK_SERVER:
select cr_requests,light_worksReducing the _FAIRNESS_THRESHOLD to lower values from the default value will provide some performance improvement if the data request to down convert (downgrade) ratio is greater than 30 percent. Setting the threshold to a lower value will greatly reduce the interconnect traffic for the CR messages. Setting _FAIRNESS_THRESHOLD to 0 disables the fairness down converts and is usually advised for systems that mainly perform SELECT operations.
In Oracle8i Database, the holder will down convert the lock and write the buffer to the disk, and the requester will always have to read the block from the disk. From Oracle9i Database, the holder down converts the lock to shared mode from exclusive, and the shared mode buffer will be transferred to the requestor’s cache through the interconnect.
select data_requests,fairness_down_converts