Chinaunix首页 | 论坛 | 博客
  • 博客访问: 219484
  • 博文数量: 57
  • 博客积分: 1376
  • 博客等级: 中尉
  • 技术积分: 658
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-09 09:51
文章分类

全部博文(57)

文章存档

2012年(1)

2011年(56)

分类: Oracle

2011-08-17 14:24:28

I encountered a strange problem today with a 3 node cluster. Let's start with the facts first:

  • RHEL 5.3 64bit
  • Oracle Clusterware 10.2.0.4 + bundle patch#4
  • Oracle ASM 10.2.0.4.1 (that is PSU 1)
  • ASMLib in use
  • Cluster members: nodea, nodeb, nodec (their real names are known to the author)

I was about to create a clustered ASM instance when it happened. I just completed dbca’s “configure automatic storage management” option which created the listeners (listener_) on each host as well as the ASM instance itself. ASM was started on all cluster nodes:

  1. [oracle@nodea bin]$ for i in a b c; do srvctl status asm -n node$i; done
  2. ASM instance +ASM1 is running on node nodea.
  3. ASM instance +ASM2 is running on node nodeb.
  4. ASM instance +ASM3 is running on node nodec.

So far so good, then I decided to query +ASM1 to see if the disks are present (I have just finished a snapclone on the storage array).

  1. [oracle@nodea~]$ sqlplus / as sysdba
  2. SQL*Plus: Release 10.2.0.4.0 - Production on Tue Oct 6 15:22:51 2009
  3. Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
  4. Connected to an idle instance.

  5. SQL>

Pardon? This is the moment where I think it would be great if wordpress could play back my astonishment and surprise about the nonexistent instance. What’s going on there? I could clearly see the instance was up!

  1. [oracle@nodea ~]$ ps -ef | grep smon
  2. oracle 23013 21570 0 15:23 pts/1 00:00:00 grep smon
  3. oracle 28834 1 0 10:34 ? 00:00:00 asm_smon_+ASM1

Whatever I tried, I couldn’t connect to the instance. Surprisingly, starting and stopping through srvctl worked just fine. I then thought that something was wrong with the profile of the ASM instance. But all looked normal, both in crs_stat -p and srvctl config asm -n nodea. I then wanted to connect to ASM via the listener. I quickly added a tnsnames.ora entry:

  1. ASM1 =
  2.   (DESCRIPTION =
  3.     (ADDRESS =
  4.       (PROTOCOL = TCP)
  5.       (HOST = nodea-vip)
  6.       (PORT = 1521))
  7.   (CONNECT_DATA = (SERVER = DEDICATED)
  8.        (UR = A)
  9.        (SERVICE_NAME = +ASM)
  10.        (INSTANCE_NAME = +ASM1)
  11.    )
  12.   )

Adding UR=A to the connect_data section is a trick bypass a listener which normally blocks incoming connection requests, especially for ASM. Now the listener didn’t want to play ball either.

ERROR: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

It got even more mysterious. I then checked if the listener was present as a process:

  1. [oracle@nodeadbs]$ ps -ef | grep tns
  2. oracle 14632 12235 0 Sep14 pts/6 00:00:00 vi tnsnames.ora
  3. oracle 28655 1 0 10:34 ? 00:00:00 /u01/app/oracle/product/10.2.0/asm_1//bin/tnslsnr LISTENER_NODEA -inherit
  4. oracle 28678 18161 0 10:34 pts/5 00:00:00 grep tns

Where does the double-slash come from after the $ORACLE_HOME I wonder? It’s not in the profile:

  1. [oracle@nodea admin]$ /u01/crs/oracle/product/crs/bin/crs_stat -p ora.nodea.LISTENER_NODEA.lsnr
  2. NAME=ora.nodea.LISTENER_NODEA.lsnr
  3. TYPE=application
  4. ACTION_SCRIPT=/u01/app/oracle/product/10.2.0/asm_1/bin/racgwrap
  5. ACTIVE_PLACEMENT=0
  6. AUTO_START=1
  7. CHECK_INTERVAL=600
  8. DESCRIPTION=CRS application for listener on node
  9. FAILOVER_DELAY=0
  10. FAILURE_INTERVAL=0
  11. FAILURE_THRESHOLD=0
  12. HOSTING_MEMBERS=nodea
  13. OPTIONAL_RESOURCES=
  14. PLACEMENT=restricted
  15. REQUIRED_RESOURCES=ora.nodea.vip
  16. RESTART_ATTEMPTS=5
  17. SCRIPT_TIMEOUT=600
  18. START_TIMEOUT=0
  19. STOP_TIMEOUT=0
  20. UPTIME_THRESHOLD=7d
  21. USR_ORA_ALERT_NAME=
  22. USR_ORA_CHECK_TIMEOUT=0
  23. USR_ORA_CONNECT_STR=/ as sysdba
  24. USR_ORA_DEBUG=0
  25. USR_ORA_DISCONNECT=false
  26. USR_ORA_FLAGS=
  27. USR_ORA_IF=
  28. USR_ORA_INST_NOT_SHUTDOWN=
  29. USR_ORA_LANG=
  30. USR_ORA_NETMASK=
  31. USR_ORA_OPEN_MODE=
  32. USR_ORA_OPI=false
  33. USR_ORA_PFILE=
  34. USR_ORA_PRECONNECT=none
  35. USR_ORA_SRV=
  36. USR_ORA_START_TIMEOUT=0
  37. USR_ORA_STOP_MODE=immediate
  38. USR_ORA_STOP_TIMEOUT=0
  39. USR_ORA_VIP=

The great thing about Linux (and Unix) is that it gives you insight to an immense amount of troubleshooting information, especially in the /proc file system. /proc/ gives you the environment variables the process was started with in the “environ” file. In my case, the ASM instance was started with PID 4680 (output formatted for readability)

  1. [oracle@nodea ~]$ cat /proc/4680/environ
  2. _USR_ORA_INST_NOT_SHUTDOWN=
  3. _CAA_OPTIONAL_RESOURCES=
  4. ORA_CRS_HOME=/u01/crs/oracle/product/crs/
  5. SELINUX_INIT=YES
  6. CONSOLE=/dev/console
  7. _USR_ORA_LANG=
  8. _CAA_FAILOVER_DELAY=0
  9. TERM=linux
  10. _CAA_UPTIME_THRESHOLD=7d
  11. _CAA_STATE=:OFFLINE, _USR_ORA_PRECONNECT=none
  12. _USR_ORA_DEBUG=0
  13. _CAA_NAME=ora.nodea.ASM1.asm
  14. _CAA_REASON=user
  15. _USR_ORA_OPEN_MODE=mount
  16. _CAA_FAILURE_INTERVAL=0
  17. _CAA_FAILURE_THRESHOLD=0
  18. _USR_ORA_CONNECT_STR=/ as sysdba
  19. _USR_ORA_FLAGS=
  20. LD_LIBRARY_PATH=/u01/app/oracle/product/10.2.0/asm_1//lib:/u01/crs/oracle/product/crs//lib
  21. _USR_ORA_START_TIMEOUT=0
  22. INIT_VERSION=sysvinit-2.86
  23. _USR_ORA_SRV=
  24. _USR_ORA_VIP=
  25. _CAA_REQUIRED_RESOURCES=
  26. _CAA_HOSTING_MEMBERS=nodea
  27. _CAA_START_TIMEOUT=0
  28. _CAA_ACTION_SCRIPT=/u01/app/oracle/product/10.2.0/asm_1/bin/racgwrap _CAA_STOP_TIMEOUT=0
  29. _CAA_AUTO_START=1
  30. PATH=
  31. _USR_ORA_NETMASK=
  32. RUNLEVEL=3
  33. _CAA_TARGET=:ONLINE,PWD=/u01/app/oracle/product/10.2.0/asm_1/bin
  34. _USR_ORA_IF=
  35. _USR_ORA_ALERT_NAME=
  36. _USR_ORA_DISCONNECT=false
  37. PREVLEVEL=
  38. N _CAA_PLACEMENT=restricted
  39. _USR_ORA_OPI=false
  40. _CAA_DESCRIPTION=CRS application for ASM instance
  41. HOME=/SHLVL=1
  42. _CAA_SCRIPT_TIMEOUT=600
  43. _CAA_CLIENT_LOCALE=
  44. _USR_ORA_CHECK_TIMEOUT=0
  45. LD_ASSUME_KERNEL=
  46. _CAA_CHECK_INTERVAL=600
  47. _CAA_ACTIVE_PLACEMENT=0
  48. _CAA_TYPE=application
  49. _USR_ORA_STOP_TIMEOUT=0
  50. _USR_ORA_STOP_MODE=immediate
  51. _USR_ORA_PFILE=
  52. _CAA_RESTART_ATTEMPTS=5
  53. ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm_1/
  54. ORACLE_CONFIG_HOME=/u01/crs/oracle/product/crs/
  55. ORACLE_SID=+ASM1
  56. ORA_NET2_DESC=8,11
  57. ORACLE_SPAWNED_PROCESS=1
  58. SKGP_HIDDEN_ARGS=<FATAL/S/x0/xB/x0/x8B53435A/4651/4651/xA>0
  59. [oracle@nodea ~]$

Among all these variables (ever wondered which effect environment variables have for debugging?) you can spot ORACLE_HOME, which has a slash appended to it. Actually, there are lots of paths with double-slashes – odd. I always though that wasn’t the case in CRS? So what about the listener’s environment variables?

  1. [oracle@nodea ~]$ cat /proc/27625/environ
  2. ...
  3. ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm_1/
  4. ...

I spotted that the path to ORACLE_HOME ended with a trailing slash, so there’s a pattern here.

A quick check on the other instances revealed that their listener/ASM instance don’t have a slash appended to $ORACLE_HOME. Where does this come from? The profile told us the the $ORACLE_HOME/bin/racgwrap script is used to start/stop the resource so I diff’d the first instance’s racgwrap script against the second instance’s and voila – no trailing slash. Actually, it was a trailing slash followed by a whitespace.

The logical steps was then to stop the ASM instance followed by stopping the listener. I then edited racgwrap and removed the trailing slash and white space after ORACLE_HOME. Did it make the difference?

  1. [oracle@nodea dbs]$ srvctl start listener -n nodea
  2. [oracle@nodea dbs]$ ps -ef | grep tns
  3. oracle 14632 12235 0 Sep14 pts/6 00:00:00 vi tnsnames.ora
  4. oracle 28655 1 0 10:34 ? 00:00:00 /u01/app/oracle/product/10.2.0/asm_1/bin/tnslsnr LISTENER_NODEA -inherit
  5. oracle 28678 18161 0 10:34 pts/5 00:00:00 grep tns

Great – no more double slashes in the path. What about ASM:


  1. [oracle@nodea dbs]$ srvctl start asm -n nodea
  2. [oracle@nodea dbs]$ . oraenv
  3. ORACLE_SID = [+ASM1] ?

  1. [oracle@nodea dbs]$ sqlplus / as sysdba
  2. SQL*Plus: Release 10.2.0.4.0 - Production on Tue Oct 6 10:34:50 2009
  3. Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
  4. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
  5. With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options

  6. SQL> exit

Phew, so all good now. Why that managed to creep in I don’t know at all, but I was glad I managed to fix it.

阅读(3478) | 评论(0) | 转发(0) |
0

上一篇:奇怪的问题

下一篇:Logical standby creatation

给主人留下些什么吧!~~