I encountered a strange problem today with a 3 node cluster. Let's start with the facts first:
- RHEL 5.3 64bit
- Oracle Clusterware 10.2.0.4 + bundle patch#4
- Oracle ASM 10.2.0.4.1 (that is PSU 1)
- ASMLib in use
- Cluster members: nodea, nodeb, nodec (their real names are known to the author)
I was about to create a clustered ASM instance when it happened. I
just completed dbca’s “configure automatic storage management” option
which created the listeners (listener_) on each host as
well as the ASM instance itself. ASM was started on all cluster nodes:
- [oracle@nodea bin]$ for i in a b c; do srvctl status asm -n node$i; done
-
ASM instance +ASM1 is running on node nodea.
-
ASM instance +ASM2 is running on node nodeb.
-
ASM instance +ASM3 is running on node nodec.
So far so good, then I decided to query +ASM1 to see if the disks are
present (I have just finished a snapclone on the storage array).
- [oracle@nodea~]$ sqlplus / as sysdba
-
SQL*Plus: Release 10.2.0.4.0 - Production on Tue Oct 6 15:22:51 2009
-
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
-
Connected to an idle instance.
-
-
SQL>
Pardon? This is the moment where I think it would be great if
wordpress could play back my astonishment and surprise about the
nonexistent instance. What’s going on there? I could clearly see the
instance was up!
- [oracle@nodea ~]$ ps -ef | grep smon
-
oracle 23013 21570 0 15:23 pts/1 00:00:00 grep smon
-
oracle 28834 1 0 10:34 ? 00:00:00 asm_smon_+ASM1
Whatever I tried, I couldn’t connect to the instance. Surprisingly,
starting and stopping through srvctl worked just fine. I then thought
that something was wrong with the profile of the ASM instance. But all
looked normal, both in crs_stat -p and srvctl config asm -n nodea. I
then wanted to connect to ASM via the listener. I quickly added a
tnsnames.ora entry:
- ASM1 =
-
(DESCRIPTION =
-
(ADDRESS =
-
(PROTOCOL = TCP)
-
(HOST = nodea-vip)
-
(PORT = 1521))
-
(CONNECT_DATA = (SERVER = DEDICATED)
-
(UR = A)
-
(SERVICE_NAME = +ASM)
-
(INSTANCE_NAME = +ASM1)
-
)
-
)
Adding UR=A to the connect_data section is a trick bypass a listener
which normally blocks incoming connection requests, especially for ASM.
Now the listener didn’t want to play ball either.
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
It got even more mysterious. I then checked if the listener was present as a process:
- [oracle@nodeadbs]$ ps -ef | grep tns
-
oracle 14632 12235 0 Sep14 pts/6 00:00:00 vi tnsnames.ora
-
oracle 28655 1 0 10:34 ? 00:00:00 /u01/app/oracle/product/10.2.0/asm_1//bin/tnslsnr LISTENER_NODEA -inherit
-
oracle 28678 18161 0 10:34 pts/5 00:00:00 grep tns
Where does the double-slash come from after the $ORACLE_HOME I wonder? It’s not in the profile:
- [oracle@nodea admin]$ /u01/crs/oracle/product/crs/bin/crs_stat -p ora.nodea.LISTENER_NODEA.lsnr
-
NAME=ora.nodea.LISTENER_NODEA.lsnr
-
TYPE=application
-
ACTION_SCRIPT=/u01/app/oracle/product/10.2.0/asm_1/bin/racgwrap
-
ACTIVE_PLACEMENT=0
-
AUTO_START=1
-
CHECK_INTERVAL=600
-
DESCRIPTION=CRS application for listener on node
-
FAILOVER_DELAY=0
-
FAILURE_INTERVAL=0
-
FAILURE_THRESHOLD=0
-
HOSTING_MEMBERS=nodea
-
OPTIONAL_RESOURCES=
-
PLACEMENT=restricted
-
REQUIRED_RESOURCES=ora.nodea.vip
-
RESTART_ATTEMPTS=5
-
SCRIPT_TIMEOUT=600
-
START_TIMEOUT=0
-
STOP_TIMEOUT=0
-
UPTIME_THRESHOLD=7d
-
USR_ORA_ALERT_NAME=
-
USR_ORA_CHECK_TIMEOUT=0
-
USR_ORA_CONNECT_STR=/ as sysdba
-
USR_ORA_DEBUG=0
-
USR_ORA_DISCONNECT=false
-
USR_ORA_FLAGS=
-
USR_ORA_IF=
-
USR_ORA_INST_NOT_SHUTDOWN=
-
USR_ORA_LANG=
-
USR_ORA_NETMASK=
-
USR_ORA_OPEN_MODE=
-
USR_ORA_OPI=false
-
USR_ORA_PFILE=
-
USR_ORA_PRECONNECT=none
-
USR_ORA_SRV=
-
USR_ORA_START_TIMEOUT=0
-
USR_ORA_STOP_MODE=immediate
-
USR_ORA_STOP_TIMEOUT=0
-
USR_ORA_VIP=
The great thing about Linux (and Unix) is that it gives you insight
to an immense amount of troubleshooting information, especially in the
/proc file system. /proc/ gives you the environment variables
the process was started with in the “environ” file. In my case, the ASM
instance was started with PID 4680 (output formatted for readability)
- [oracle@nodea ~]$ cat /proc/4680/environ
-
_USR_ORA_INST_NOT_SHUTDOWN=
-
_CAA_OPTIONAL_RESOURCES=
-
ORA_CRS_HOME=/u01/crs/oracle/product/crs/
-
SELINUX_INIT=YES
-
CONSOLE=/dev/console
-
_USR_ORA_LANG=
-
_CAA_FAILOVER_DELAY=0
-
TERM=linux
-
_CAA_UPTIME_THRESHOLD=7d
-
_CAA_STATE=:OFFLINE, _USR_ORA_PRECONNECT=none
-
_USR_ORA_DEBUG=0
-
_CAA_NAME=ora.nodea.ASM1.asm
-
_CAA_REASON=user
-
_USR_ORA_OPEN_MODE=mount
-
_CAA_FAILURE_INTERVAL=0
-
_CAA_FAILURE_THRESHOLD=0
-
_USR_ORA_CONNECT_STR=/ as sysdba
-
_USR_ORA_FLAGS=
-
LD_LIBRARY_PATH=/u01/app/oracle/product/10.2.0/asm_1//lib:/u01/crs/oracle/product/crs//lib
-
_USR_ORA_START_TIMEOUT=0
-
INIT_VERSION=sysvinit-2.86
-
_USR_ORA_SRV=
-
_USR_ORA_VIP=
-
_CAA_REQUIRED_RESOURCES=
-
_CAA_HOSTING_MEMBERS=nodea
-
_CAA_START_TIMEOUT=0
-
_CAA_ACTION_SCRIPT=/u01/app/oracle/product/10.2.0/asm_1/bin/racgwrap _CAA_STOP_TIMEOUT=0
-
_CAA_AUTO_START=1
-
PATH=
-
_USR_ORA_NETMASK=
-
RUNLEVEL=3
-
_CAA_TARGET=:ONLINE,PWD=/u01/app/oracle/product/10.2.0/asm_1/bin
-
_USR_ORA_IF=
-
_USR_ORA_ALERT_NAME=
-
_USR_ORA_DISCONNECT=false
-
PREVLEVEL=
-
N _CAA_PLACEMENT=restricted
-
_USR_ORA_OPI=false
-
_CAA_DESCRIPTION=CRS application for ASM instance
-
HOME=/SHLVL=1
-
_CAA_SCRIPT_TIMEOUT=600
-
_CAA_CLIENT_LOCALE=
-
_USR_ORA_CHECK_TIMEOUT=0
-
LD_ASSUME_KERNEL=
-
_CAA_CHECK_INTERVAL=600
-
_CAA_ACTIVE_PLACEMENT=0
-
_CAA_TYPE=application
-
_USR_ORA_STOP_TIMEOUT=0
-
_USR_ORA_STOP_MODE=immediate
-
_USR_ORA_PFILE=
-
_CAA_RESTART_ATTEMPTS=5
-
ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm_1/
-
ORACLE_CONFIG_HOME=/u01/crs/oracle/product/crs/
-
ORACLE_SID=+ASM1
-
ORA_NET2_DESC=8,11
-
ORACLE_SPAWNED_PROCESS=1
-
SKGP_HIDDEN_ARGS=<FATAL/S/x0/xB/x0/x8B53435A/4651/4651/xA>0
-
[oracle@nodea ~]$
Among all these variables (ever wondered which effect environment
variables have for debugging?) you can spot ORACLE_HOME, which has a
slash appended to it. Actually, there are lots of paths with
double-slashes – odd. I always though that wasn’t the case in CRS? So
what about the listener’s environment variables?
- [oracle@nodea ~]$ cat /proc/27625/environ
-
...
-
ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm_1/
-
...
I spotted that the path to ORACLE_HOME ended with a trailing slash, so there’s a pattern here.
A quick check on the other instances revealed that their listener/ASM
instance don’t have a slash appended to $ORACLE_HOME. Where does this
come from? The profile told us the the $ORACLE_HOME/bin/racgwrap script
is used to start/stop the resource so I diff’d the first instance’s
racgwrap script against the second instance’s and voila – no trailing
slash. Actually, it was a trailing slash followed by a whitespace.
The logical steps was then to stop the ASM instance followed by
stopping the listener. I then edited racgwrap and removed the trailing
slash and white space after ORACLE_HOME. Did it make the difference?
- [oracle@nodea dbs]$ srvctl start listener -n nodea
-
[oracle@nodea dbs]$ ps -ef | grep tns
-
oracle 14632 12235 0 Sep14 pts/6 00:00:00 vi tnsnames.ora
-
oracle 28655 1 0 10:34 ? 00:00:00 /u01/app/oracle/product/10.2.0/asm_1/bin/tnslsnr LISTENER_NODEA -inherit
-
oracle 28678 18161 0 10:34 pts/5 00:00:00 grep tns
Great – no more double slashes in the path. What about ASM:
- [oracle@nodea dbs]$ srvctl start asm -n nodea
-
[oracle@nodea dbs]$ . oraenv
-
ORACLE_SID = [+ASM1] ?
- [oracle@nodea dbs]$ sqlplus / as sysdba
-
SQL*Plus: Release 10.2.0.4.0 - Production on Tue Oct 6 10:34:50 2009
-
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
-
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
-
With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options
-
-
SQL> exit
Phew, so all good now. Why that managed to creep in I don’t know at all, but I was glad I managed to fix it.
阅读(3478) | 评论(0) | 转发(0) |