11.2.0.4 RAC打上20200414 psu后crsctl stat res -t看到ora.drivers.acfs为offline,通常没人用acfs也不用管,事情到此结束。
[root@db01 ~]# crsctl status resource ora.registry.acfs -f
看各参数设置打补丁前后都一样。
官方mos搜了搜发现还是和os内核有关,但是刚装好时都时online,怎么升级psu后就offline了,是你打补丁破坏了这个系统,赔钱!
继续瞎分析。
搜索一番(人品时刻到来了)
发现 这个服务由orarootagent负责启动,这个知识点基本没用。
意外发现 ora.diskmon offline的原因,见NOTE:1346881.1 - 11.2.0.3 Grid Infrastructure diskmon Will be Offline by Default in Non-Exadata Environment,总算有点收获了,这个offline能解释了。
继续搜索(人品时刻又到来了)
有人介绍了一下这个服务是干什么的(我不想了解),ACFS是一个通用的便携式群集文件系统,可以在许多操作系统上运行,作为Oracle 11.2或更高版本中Grid Infrastructure安装的一部分进行安装。ACFS最初在Linux和Windows(从Oracle 11.2.0.1开始)以及Solaris,AIX和Oracle 11.2.0.2上可用。ACFS在Linux / Unix上兼容POSIX和X / OPEN,可以通过NAS协议(例如NFS和CIFS)进行远程访问。
如果正常运行,通过lsmod能看到启动了
[root@db01 ~]# lsmod | grep oracle
oracleacfs 1990406 0
oracleadvm 250040 0
oracleoks 427672 2 oracleacfs,oracleadvm
目标指向这个lsmod是什么东西呢?linux的命令,不懂没关系(但后面会说这个关系)
发现 acfs开头有几个命令,都放在$GRID_HOME/bin/目录下,例如在linux中,用grid用户敲acfs后按tab键能列出来好几个相关维护命令,令人开眼(不想学习)。
[root@db01 ~]# acfsroot disable --禁用acfs
[root@db01 ~]# acfsroot uninstall --卸载acfs
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
[root@db01 ~]# acfsload stop --停止acfs模块
[root@db01 ~]# acfsload start --启动acfs模块
ACFS-9391: Checking for existing ADVM/ACFS installation.
ACFS-9392: Validating ADVM/ACFS installation files for operating system.
ACFS-9393: Verifying ASM Administrator setup.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9154: Loading 'oracleoks.ko' driver.
FATAL: Module oracleoks not found.
ACFS-9109: oracleoks.ko driver failed to load.
ACFS-9127: Not all ADVM/ACFS drivers have been loaded.
[root@db01 ~]#
安装acfs,并打印详细过程:
[root@db01 ~]# acfsroot install -v
ACFS-9500: Location of Oracle Home is '/u01/app/11.2/grid' as determined from the internal configuration data
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9155: Checking for existing 'oracleoks.ko' driver installation.
ACFS-9155: Checking for existing 'oracleoks.ko' driver installation.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9503: ADVM and ACFS driver media location is '/u01/app/11.2/grid/install/usm/Oracle/EL6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin'
ACFS-9504: Copying file '/u01/app/11.2/grid/install/usm/Oracle/EL6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin/oracleadvm.ko' to the path '/lib/modules/2.6.32-696.23.1.el6.x86_64/extra/usm/oracleadvm.ko'
ACFS-9504: Copying file '/u01/app/11.2/grid/install/usm/Oracle/EL6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin/oracleoks.ko' to the path '/lib/modules/2.6.32-696.23.1.el6.x86_64/extra/usm/oracleoks.ko'
ACFS-9504: Copying file '/u01/app/11.2/grid/install/usm/Oracle/EL6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin/oracleacfs.ko' to the path '/lib/modules/2.6.32-696.23.1.el6.x86_64/extra/usm/oracleacfs.ko'
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9154: Loading 'oracleoks.ko' driver.
FATAL: Module oracleoks not found.
ACFS-9109: oracleoks.ko driver failed to load.
ACFS-9428: Message 9428 not found; product=usm; facility=acfs
ACFS-9310: ADVM/ACFS installation failed.
来来回回跟
oracleoks.ko 这个文件有关且报错ACFS-9109。
搜一下这个文件,通过痛苦的比较(find / -name
oracleoks.ko)发现,的确在安装psu后这个文件有变化,此处有论文一篇:《虚拟机快照与调试效率的重要性》。
执行启动命令
[root@db01 ~]# crsctl start res ora.drivers.acfs -init
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'db01'
CRS-5016: Process "/u01/app/11.2/grid/bin/acfsload" spawned by agent "/u01/app/11.2/grid/bin/orarootagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/11.2/grid/log/db01/agent/ohasd/orarootagent_root//orarootagent_root.log"
CRS-2674: Start of 'ora.drivers.acfs' on 'db01' failed
CRS-4000: Command Start failed, or completed with errors.
日志如下:
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] execCmd ret = 1
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)ACFS-9391: Checking for existing ADVM/ACFS installation.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)ACFS-9392: Validating ADVM/ACFS installation files for operating system.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)ACFS-9393: Verifying ASM Administrator setup.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)ACFS-9308: Loading installed ADVM/ACFS drivers.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)ACFS-9154: Loading 'oracleoks.ko' driver.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)FATAL: Module oracleoks not found.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)ACFS-9109: oracleoks.ko driver failed to load.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)ACFS-9127: Not all ADVM/ACFS drivers have been loaded.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:CLSN00010:)
看来关键在
ACFS-9109这个报错,看看怎么说的。
[grid@db01 ~]$ oerr acfs 9109
09109, 0, "%s driver failed to load."
// *Cause: The driver failed to load.
// *Action: View the system specific OS kernel log
// (for instance, /var/log/messages on Linux, Event Log on Windows).
// If the drivers have not previously been unloaded
// ('crsctl stop crs', 'acfsload stop', 'acfsroot uninstall'), it is
// not possible to reload them.
// If a specific error has occurred, than clear the error condition
// and try again. If the OS and\or architecture is not
// supported by the drivers, than contact
// Oracle Support Services for an updated driver package.
[grid@db01 ~]$ oerr acfs 9127
09127, 0, "Not all ADVM/ACFS drivers have been loaded."
// *Cause: ADVM/ACFS device drivers have been started but not all
// of them are detected as running.
// *Action: Try 'acfsload stop' followed by 'acfsload start'.
// If that does not start all drivers, than contact Oracle Support
// Services.
经过严密测试:
[root@db01 ~]# acfsdriverstate version
ACFS-9325: Driver OS kernel version = 2.6.32-696.23.1.el6.x86_64(x86_64).
ACFS-9326: Driver Oracle version = 190625.
重大结论是:
acfsload sotp后lsmod |grep acfs就看不到东西了
acfsload start 后就能看到了
so
,解决方法是:
按官方资料DOC
1369107.1,升级os内核。
--------------------------------------------------------------
参考:
How To Install/Reinstall Or Deinstall ACFS Modules/Installation Manually? (Doc ID 1371067.1)
ACFS Support On OS Platforms (Certification Matrix). (文档 ID 1369107.1)
命令:
crs_stat -p ora.diskmon
crsctl start resource ora.cssd
crsctl modify resource "ora.cssd" -attr "AUTO_START=1" or crsctl modify resource "ora.diskmon" -attr "AUTO_START=1"
crsctl modify resource "ora.cssd" -attr "AUTO_START=never" crsctl modify resource "ora.diskmon" -attr "AUTO_START=never"
(感觉有用,如果上面一点没用的话看看这个吧)
https://blog.csdn.net/gsforget321/article/details/88392277 (这个能看到几个acfs常用维护命令)