资深Oracle数据库专家 OCM认证大师 10年数据库相关服务及开发经验 各类数据库相关方案的编写,管理及实施 数据中心数据库日常运维、大型项目割接、性能优化等方面有丰富的实战经验 客户包括: 电信,银行,保险,航空,国网,汽车,烟草等 想要一起学习探讨数据安全技术的请加qq群 256041954
全部博文(163)
分类: Oracle
2016-04-22 22:34:10
Disclaimer: This document touches briefly on many important and complex concepts and does not provide a detailed explanation of any one topic since the intent is to present the material in the most expedient manner. The goal is simply to help the reader become familiar enough with the product to successfully design and implement an Oracle GoldenGate environment. To that end, it is important to note that the activities of design, unit testing and integration testing which are crucial to a successful implementation have been intentionally left out of the guide. All the sample scripts are provided as is. Oracle consulting service is highly recommended for any customized implementation.
About this best practice document
Executive Overview
Oracle GoldenGate is often used in mission-critical systems with stringent high availability requirements. In an Oracle to Oracle scenario, minimal latency between the source and the destination databases is important to achieve minimal downtime and minimal data loss in case of a failover. In an Oracle Real Application Cluster (RAC) configuration access to the database is not dependent on the availability of any one of the servers in the cluster. Oracle GoldenGate however could be impacted by non-availability of a single server since many critical components run only on one of the cluster servers.
This paper addresses how to achieve high availability for Oracle GoldenGate in a cluster configuration using Oracle Clusterware. Oracle Clusterware will ensure that Oracle GoldenGate can tolerate server failures by moving processing to another available server in the cluster. As a result Oracle GoldenGate processing is dependent on database availability rather than server availability thereby ensuring minimal latency.
This paper includes sample code that can be used to configure Oracle Clusterware to manage Oracle GoldenGate manager. The example is generic and can serve as a starting point for a more customized Oracle GoldenGate high availability implementation.
Please refer to Oracle support note 1313703.1 for the latest version of this document.
Also please refer to Oracle Support note 790189.1. The example in this document falls under the category “Using Oracle Clusterware to protect any kind of application”.
High Availability for Oracle GoldenGate
In a typical production environment the following Oracle GoldenGate processes will be running (also see Figure 1 below):
· Manager at source and target systems (always). The manager process overlooks other Oracle GoldenGate processes.
· Zero or more extract processes on the source system to capture from the transaction logs.
· Zero or more extract processes on the source system to send data to one or more target systems. In Oracle GoldenGate terminology these processes are called data pumps.
· Zero or more replicat processes on the target system to apply changes from a trail to a target database.
· Zero or more server collector processes on the target system that receive trails sent by a pump (extract) running on another server.
Oracle GoldenGate processes
Figure 1 below shows a high level overview of the Oracle GoldenGate processes for a typical uni-directional Oracle source to Oracle target replication scenario. The boxes indicate high level source and target configuration without going into details on the actual implementation, although Figure 1 does assume the source configuration is an Oracle RAC database. For additional information about numbers in Figure 1 refer to the legend on the next page. The numbers are not meant to indicate any ordering.
Figure 1. Oracle GoldenGate processes
Legend to Figure 1:
1. Oracle GoldenGate manager starts extract (capture/pump) and replicat (apply) processes.
2. Extract (capture) spawns an extract reader thread for every node on the source database. The coordinator thread receives, interprets and orders redo from all readers.
3. Extract (capture) writes the source trail in transaction commit order.
4. Extract (pump) reads the source trail as transactions are written to the trail.
5. Extract (pump) contacts the manager process on the target system over TCP/IP to initialize trail transfer. Manager indicates to extract which port to use to send the trail.
6. Manager on the target spawns a server collector process on a TCP/IP port.
7. After the initialization with manager extract (pump) sends trails directly to server (collector).
8. Server (collector) writes the data it receives to a local (target) trail.
9. Replicat (apply) reads the target trail as it arrives.
10. Replicat (apply) applies the transactions to the target database using the Oracle Call Interface (OCI) and/or SQL.
Single server high availability
Oracle GoldenGate's single server configuration high availability approach is achieved through configuration parameters in the manager process. The manager parameters AUTOSTART and AUTORESTARTattempt to prevent Oracle GoldenGate outages and ensure Oracle GoldenGate processes get restarted so long as the manager process is running. Oracle GoldenGate manager starts server collector processes automatically when it receives a request from a remote extract to send trails.
Certain failures beyond Oracle GoldenGate may cause Oracle GoldenGate outages. E.g.
· Network failure.
· Database crashes or problems (e.g. database (or database instance) is down).
· Data integrity issues (e.g. problems caused by an out-of-sync condition).
· Server issues (e.g. OS crash).
In the event that the above outages result in a failure of the manager process, the manager process must be restarted. It is important to monitor the manager process since in some cases, operator intervention could be required. Management Pack for Oracle GoldenGate provides capabilities to set up notifications such as email alerts etc. Third-party products can also be used for monitoring.
Cluster high availability
An Oracle GoldenGate configuration in a cluster builds on top of the single server configuration. The approaches you would use in a single server configuration to keep Oracle GoldenGate extract and replicat available, using AUTOSTART and AUTORESTART parameters in the manager parameter file, still apply.
Oracle GoldenGate cluster high availability prerequisites
In a cluster configuration – on the source and/or on the target – Oracle GoldenGate runs on a single server at any time. If that server goes down (e.g. due to system maintenance, a crash, etc.) then Oracle GoldenGate can be started on another server. In order for Oracle GoldenGate to resume processing, you must, at a minimum, configure the recovery-related[1] Oracle GoldenGate files in a centrally shared location:
· checkpoint files ($GGATE_HOME/dirchk)
· trail files
Additionally the parameter files ($GGATE_HOME/dirprm) will have to be identical between different nodes in the cluster[2]. You may choose to share these amongst all Oracle GoldenGate installations on the different nodes also.
The shared files should be stored on shared storage that is available independent of any particular server availability. I.e. a shared local directory (e.g. mounted on other servers using NFS) is not recommended since the server hosting the shared directory may not be available at all times. Make sure that your file system fulfills your availability and performance requirements.
You may choose to install Oracle GoldenGate in a central location, for example on a cluster file system, so that every server can run Oracle GoldenGate from this single installation. Of course Oracle GoldenGate cannot run on multiple servers at any point in time.
Shared Storage
Most shared storage solutions, including general purpose cluster file systems, can be used to install Oracle GoldenGate or to store the files that Oracle GoldenGate needs to recover. The following options are available from Oracle at no additional cost:
· Oracle Cluster File System (OCFS2) – available only on Linux.
· Oracle Automatic Storage Management (ASM) Cluster File System (ACFS). Oracle ACFS was introduced with Oracle Database 11g Release 2.
· Oracle DataBase File System (DBFS). Oracle DBFS was introduced with Oracle Database 11g Release 2.
Oracle Cluster File System (OCFS2)
One of the options you may choose to use on Linux is Oracle Cluster File System (OCFS2) which is included in recent Linux distributions (included in the Linux kernel in some distributions). OCFS2 is an open source general purpose cluster file system. Instead of installing Oracle GoldenGate in a local directory you would install Oracle GoldenGate in a directory that is mounted to an OCFS2 volume. Refer to the OCFS2 website for more information:
OCFS2 can also be used for Oracle Database storage, although Oracle recommends the use of Oracle Automatic Storage Management (ASM) starting with Oracle Database 10g.
Oracle ASM Cluster File System (ACFS)[3]
Oracle Database 11g Release 2 introduces the Oracle Automatic Storage Management Cluster File System (ASM Cluster File System, ACFS). ACFS is a general purpose single-node (standalone) or cluster file system on top of ASM (but outside a database). ACFS can be accessed using industry-standard Network Attached Storage (NAS) file access protocols: Network File System (NFS) and Common Internet File System (CIFS).
ACFS file systems would generally be mounted on all nodes in a cluster. As a result ACFS can be used to install Oracle GoldenGate to make it accessible to all nodes, and to store processing files required to failover between nodes in case of a failure.
ACFS is part of ASM as part of the Oracle Database 11g Release 2 Grid Infrastructure installation. Oracle Database 11g Release 2 Clusterware and ASM must be used for Oracle Database 11g Release 2, but can also be used for Oracle Database 11g Release 1 or Oracle Database 10g Release 2.
Note: ACFS can be used for Oracle Golden Gate trail files with no restrictions.
Oracle GoldenGate installation can be done on ACFS and you can also store the recovery-related
files in a cluster configuration in ACFS to make them accessible to all nodes; however, ACFS mounted on multiple servers concurrently does not currently support file locking (see footnote 4, bottom of page 7), thus you would need to mount ACFS on only one server. If ACFS is mounted on one server at a time then file locking is supported.
For more information about the Oracle Database 11g Release 2 ACFS, please refer to the Oracle Database Storage Administrator's Guide as part of the Oracle Database 11g Release 2 documentation set ().
Database File System (DBFS)
Oracle Database 11g Release 2 also introduces a Database File System (DBFS). In DBFS files are stored as secure files which are internally stored as LOB data values in the Oracle Database. In-database storage provides high availability, security and encryption capabilities that may not be otherwise available on general purpose file systems. In a cluster configuration the DBFS can be accessed from multiple nodes, and hence it can act as a cluster file system.
Files in DBFS can be managed through a set of PL/SQL APIs. In order to mount a DBFS as an OS file system another component, the DBFS client (dbfs_client) is required. For Oracle Database 11.2.0.1 you can only mount a DBFS file system on Linux.
DBFS can be used for Oracle Golden Gate trail files with no restrictions.
Oracle GoldenGate installation can be done on DBFS and you can also store the recovery-related files in a cluster configuration in DBFS to make them accessible to all nodes; however, DBFS mounted on multiple servers concurrently does not currently support file locking thus you would need to mount DBFS on only one server.
GoldenGate bounded recovery files are supported on DBFS only from GoldenGate version 11.2.1 and Oracle Database 11g Release 2 (or higher).
For more information about DBFS, its restrictions as well as how to configure a DBFS, please refer to the Oracle Database SecureFile and Large Objects Developer's Guide as part of the Oracle Database 11g Release 2 documentation set (http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10645/toc.htm).
About this best practice document
This best practice document will not be able to provide an out-of-the-box solution to any and all issues causing Oracle GoldenGate outages. Any configuration can be made more highly available beyond the generic example in this paper through additional implementation-specific coding.
In any and all critical implementations you have to configure notification procedures to minimize the impact of any Oracle GoldenGate outages, irrespective whether the outages are related or unrelated to Oracle GoldenGate.
The main issues that this best practice document will help address are server-related issues, to some degree database problems and in rare cases network issues. Other problems, for example data issues, but various other issues, will have to continue to be solved beyond Oracle GoldenGate, often (only) through manual intervention. Once resolved Oracle GoldenGate will resume processing where it left off, assuming the recovery-related files are available.
The implementation of the best practice in this document requires Oracle Clusterware 10g Release 2 or higher. Earlier versions of Oracle Clusterware cannot be used. The example included in this document should work on most Unix and Linux environments (it has been tested on Linux with Oracle Database 11g Release 1 and Release 2 and Oracle GoldenGate v10 and v10.4). For Windows the same approach can be used but the sample script will have to be modified to run on Windows.
Third-party cluster management software such as Veritas Cluster Server, Sun Cluster, Redhat Cluster Manager etc. likely provide similar capabilities but will not be discussed in this document. This document may still be a useful reference for other cluster management solutions but the included example will not work.
Oracle GoldenGate with Oracle Clusterware
This section provides an introduction into Oracle Clusterware and indicates how to install Oracle GoldenGate in a cluster configuration.
About Oracle Clusterware
Oracle Clusterware is Oracle's cluster management software. The software manages node membership and provides functionality such as resource management and high availability.
Starting with Oracle Clusterware 10g Release 2 Oracle Clusterware provides the capability to manage third-party applications. There are commands to register an application and instruct Oracle Clusterware how to manage the application in a clustered environment. This capability will be used to register the Oracle GoldenGate manager process as an application managed through Oracle Clusterware.
Oracle Clusterware can be installed standalone without an Oracle RAC database and still manage a cluster of servers and various applications running on these servers. As such Oracle Clusterware can also be installed on more than just the database servers to form a single cluster. For example you may use 4 database servers and 2 additional Oracle GoldenGate servers in a single cluster. The Oracle Database would only run on the 4 database servers and Oracle GoldenGate would only run on one of the two available Oracle GoldenGate servers (with failover to the other server dedicated to Oracle GoldenGate in case that server fails).
Oracle Clusterware Configuration
This section discusses how to include Oracle GoldenGate in an Oracle Clusterware configuration. It is assumed that Oracle Clusterware has already been installed and is functioning. Refer to the Oracle documentation starting at on how to install Oracle Clusterware.
Oracle GoldenGate installation
Oracle GoldenGate must be available on every server in the same location (e.g. /u01/app/ggate). You may choose to perform a local installation on every server, or a single installation on a shared file system. You will need shared storage for the recovery-related files. On a Unix/Linux platform you can use a symbolic link to a central location for the shared directories.
The environments used to validate the examples in this document used a shared OCFS2 volume and an ACFS volume for the Oracle GoldenGate installation.
Virtual IP address (VIP)
Oracle Clusterware uses the concept of a Virtual IP address (VIP) to manage high availability for applications that require incoming network traffic (including the Oracle RAC database). A VIP is an IP address on the public subnet that can be used to access a server. If the server hosting the VIP were to go down, then Oracle Clusterware will migrate the VIP to a surviving server to minimize interruptions for the application accessing the server (through the VIP). This concept enables faster failovers compared to time-out based failovers on a server's actual IP address in case of a server failure.
For Oracle GoldenGate, you should use a VIP to access the manager process to isolate access to the manager process from the physical server that is running Oracle GoldenGate. Remote pumps must use the VIP to contact the Oracle GoldenGate manager. The Management Pack for Oracle GoldenGate, if used, should use the VIP to contact the Oracle GoldenGate manager. The VIP must be an available IP address on the public subnet and cannot be determined through DHCP. Ask a system administrator for an available fixed IP address for Oracle GoldenGate managed through Oracle Clusterware.
START, CHECK, STOP, CLEAN and ABORT routines
Oracle Clusterware must be instructed how to start the program, check whether it is running, and stop it. Programs using various programming languages, including shell scripts, can be registered and run by Oracle Clusterware.
START
Oracle GoldenGate manager is the process that starts all other processes Oracle GoldenGate processes. The only process that Oracle Clusterware should start is the manager process. Use theAUTOSTART parameter in the manager parameter file to start extract and replicat processes. You can use wild cards (AUTOSTART ER *) to start all extract and replicat processes when manager is started, but note that any initial load extract and/or replicats will start with the unlimited ER * wild card. Based on a naming convention you use, you may use more restrictive wild cards (e.g.AUTOSTART EXTRACT cdc*) or list specific extracts/replicats you want to start automatically. Refer to the Oracle GoldenGate Reference Guide for the details on AUTOSTART).
Also note that once manager is started through Oracle Clusterware, it is Oracle Clusterware that manages its availability. If you would stop manager through the command interface ggsci, then Oracle Clusterware will attempt to restart it. Use the Oracle Clusterware commands (see the example in the next chapter) to stop Oracle GoldenGate and prevent Oracle Clusterware from attempting to restart it.
All Oracle GoldenGate processes except manager can still be controlled through ggsci. In the recommended setup, Oracle Clusterware will not interfere with ggsci commands that manipulate Oracle GoldenGate processes. In most production environments however you probably want all processes to be running all the time. Use AUTORESTART in the manager parameter file for manager to automatically attempt to restart any processes that would go down. Also, make sure to have the necessary notification procedures in place if processes were to go down and stay down for some reason.
CHECK
The validation whether Oracle GoldenGate is running is equivalent to making sure the Oracle GoldenGate manager runs. Use the AUTORESTART parameter in the manager parameter file to ensure that extract and replicat processes will be restarted if/when they go down. Also make sure to have a notification infrastructure in place to prevent Oracle GoldenGate processes from staying down for an extended period of time due to errors that are beyond Oracle GoldenGate's control (e.g. data errors).
You may choose to implement very extensive checking to ensure all Oracle GoldenGate processes are running fine but it does not make much sense to let Oracle Clusterware manage any other processes but Oracle GoldenGate manager. The only reason why Oracle Clusterware may be able to start a process when Oracle GoldenGate manager cannot, would be related to the environment settings such as the ORACLE_HOME setting. These settings ought to be corrected so that Oracle GoldenGate manager can always start its processes.
STOP
Stop must stop all Oracle GoldenGate processes, including manager. Stop may be called during a planned downtime (e.g. a server is taken out of a cluster for maintenance reasons) and/or if you manually instruct Oracle Clusterware to relocate Oracle GoldenGate to a different server (e.g. to change the load on a server). If a server crashes then all processes will go down with it, in which case they can be started on another server.
CLEAN
Clean was introduced with Oracle Clusterware 11g Release 2. It will not be used for Oracle Clusterware 10g Release 2 or 11g Release 1. Clean is called when there is a need to clean up the resource. It is a non-graceful operation.
ABORT
Abort was introduced with Oracle Clusterware 11g Release 2. It will not be used for Oracle Clusterware 10g Release 2 or 11g Release 1. Abort is called if any of the resource components hang to abort the ongoing action. Abort is not required to be included.
Program registration, start and stop
Once Oracle GoldenGate has been installed across the cluster and a script to start, check and stop (and for Oracle Clusterware 11g Release 2, clean and optionally abort) has been written and has been made accessible to all nodes, Oracle GoldenGate can be registered in Oracle Clusterware. Use the Clusterware commands to create, register and set privileges on the VIP and the Oracle GoldenGate application. Once registered, use the Oracle Clusterware commands to start, relocate and stop Oracle GoldenGate. For detailed steps see the example in the next section.
The Oracle Clusterware commands are documented in an appendix in the Oracle Clusterware Administration and Deployment Guide:
· Oracle Database 10g Release 2:
· Oracle Database 11g Release 1:
· Oracle Database 11g Release 2:
Examples
This section goes step-by-step through a couple of examples. This section covers a separate example for Oracle Clusterware 10g Release 2 and 11g Release 1 versus Oracle Clusterware 11g Release 2. Please refer to the relevant section below depending on which Oracle Clusterware release you use.
Oracle Clusterware 10g Release 2 and 11g Release 1
This section provides an example for Oracle Clusterware 10g Release 2 and 11g Release 1.
Step 1: Add an application VIP
The first step is to create an application VIP. The VIP will be used to access Oracle GoldenGate (e.g. by a remote pump or by the Management Pack for Oracle GoldenGate). Oracle Clusterware will assign the VIP to a physical server, and migrate the VIP if that server were to go down or if you instruct Clusterware to do so.
To create the application VIP, login as the OS Oracle software owner (oracle in this example) and run:
CLUSTERWARE_HOME/bin/crs_profile -create ggatevip \
-t application \
-a CLUSTERWARE_HOME/bin/usrvip \
-o oi=eth0,ov=192.168.1.23,on=255.255.255.0
with:
· CLUSTERWARE_HOME as the oracle home in which Oracle Clusterware is installed (e.g./u01/app/oracle/crs111).
· ggatevip is the name of the application VIP that you will create.
· oi=eth0; eth0 is the public interface in this example.
· ov=192.168.1.23; the virtual IP address is 192.168.1.23 in this example.
· on=255.255.255.0; the subnet mask. This should be the same subnet mask for the public (general) IP address.
There are more options you can set through the crs_* commands. For example, you can indicate what nodes can be used to host the application, if there is a preference for a node to run the application, and whether you want to always start the application upon reboot, never, or restore the last state upon reboot (the default). Please refer to the Oracle Clusterware documentation for details:
Oracle Clusterware 10g Release 2: Oracle Clusterware 11g Release 1:
Next, register the VIP as oracle:
CLUSTERWARE_HOME/bin/crs_register ggatevip
Because the assignment of an IP address is done by the root user, you have to set the ownership of the VIP to the root user. Connect as root and execute:
CLUSTERWARE_HOME/bin/crs_setperm ggatevip -o root
As root, allow oracle to run the script to start the VIP.
CLUSTERWARE_HOME/bin/crs_setperm ggatevip -u user:oracle:r-x
Then, as oracle, start the VIP:
CLUSTERWARE_HOME/bin/crs_start ggatevip
To validate whether the VIP is running and on which node it is running, execute:
CLSUTERWARE_HOME/bin/crs_stat ggatevip -t
For example:
[oracle@rac2 bin]$ crs_stat ggatevip -t
Name Type Target State Host
------------------------------------------------------------
ggatevip application ONLINE ONLINE rac2