Product Environment:
windows server2000 SP4
FailSafe 3.3.4
Issue:
Install 10.2.0.3 database in FailSafe where oracle9i has already been installed.
adding database resource to Failsafe failed, occured ORA-00000 error.
varified standalone database with the database startup is normal, but shutdown database
to varify is fail.
check the metalink:
CAUSE DETERMINATION
====================
The issue is caused by the following setup:
When a Fail Safe database is started up, the database is monitored via rsrcmon.exe, which is an executable spawned by the Cluster Service for MSCS. Rsrcmon.exe uses the Resource Dll provided by Oracle Fail Safe (FsResOdbs.DLL) to determine how to monitor a database. At the time that rsrcmon.exe is initialized, the Fail Safe Resource DLL will load the libraries from the highest installed database version to monitor the database.
If Oracle 10g is installed into a new home on a Fail Safe cluster where an Oracle9i database is already running and being monitored as part of a resource group, then the 9i libraries are already loaded, and will be used to connect to the database.
When the Oracle10g Database is brought online as part of the Add to Group operation, the operation actually fails, but the ORA-00000 gives the impression that it succeeded. The root cause of this problem is a mismatch between the client libraries being used by Fail Safe, and the actual database version being added to the group.
SOLUTION / ACTION PLAN
=======================
The solution to this is to force all rsrcmon.exe processes on each cluster node to be restarted. Restarting rsrcmon.exe after Oracle10g has been installed will allow the Oracle10g libraries to be loaded, as opposed to the Oracle9i libraries. This will then allow Fail Safe to properly communicate with all databases in the cluster, including the existing Oracle9i databases, as well as the new Oracle10g databases.
To respawn the rsrcmon.exe processes, do the following:
1. Pick a node to work on first
2. Move any existing groups on that node to another node in the cluster.
3. Stop the "Cluster Service" on that node
4. After "Cluster Service" is stopped, verify via Task Manager that there are no running rsrcmon.exe processes. If there are still running rsrcmon.exe processes after stopping the "Cluster Service" these orphaned processes can be killed via Task Manager
5. After confirming that there are no longer any rsrcmon.exe processes, restart the "Cluster Service" on that node
6. Move groups back to this node, and then repeat the process on the next node.
7. Repeat on all nodes until the Cluster Service has been restarted on every node in the cluster. Then, attempt the 'Add to Group' operation again.
Alternatively, a reboot of each node in the cluster will accomplish the same thing.
阅读(920) | 评论(0) | 转发(0) |