我的环境是Redhat 7.2+ Oracle RAC 11204,本来系统已经运行了一段时间了,今天登陆无意间发现节点2的示例down了,而所有的 crs服务都很正常。
于是查看节点2的alert 日志:
vi /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/alert_rac1122.log
Mon Jul 13 11:05:48 2020
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_dbw4_28544.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
Mon Jul 13 11:05:48 2020
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_o000_295057.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
DBW4 (ospid: 28544): terminating the instance due to error 27157
Mon Jul 13 11:05:48 2020
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_j001_295484.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
Mon Jul 13 11:05:48 2020
System state dump requested by (instance=2, osid=28544 (DBW4)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_diag_28495_20200713110548.trc
Dumping diagnostic data in directory=[cdmp_20200713110548], requested by (instance=2, osid=28544 (DBW4)), summary=[abnormal instance termination].
Instance terminated by DBW4, pid = 28544
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_dbw4_28544.trc:
ORA-27300: OS system dependent operation:semctl failed with status: 22
ORA-27301: OS failure message: Invalid argument
ORA-27302: failure occurred at: sskgpwrm1
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
Mon Jul 13 11:05:59 2020
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = UNLIMITED
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
Total System Global Area size is 450 GB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 230401 (page size 2048 KB, total size 450 GB) system wide to
get 100% of the System Global Area allocated with large pages
Initial number of CPU is 96
Number of processor cores in the system is 48
Number of processor sockets in the system is 4
Private Interface 'eno2:1' configured from GPnP for use as a private interconnect.
[name='eno2:1', type=1, ip=xx.xx.xx.155, mac=xxxxxxxx, net=, mask=, use=haip:cluster_interconnect/62]
Public Interface 'eno1' configured from GPnP for use as a public interface.
[name='eno1', type=1, ip=xx.xx.xx.122, mac=70-57-bf-39-1c-25, net=xx.xx.xx.0/24, mask=, use=public/1]
Public Interface 'eno1:1' configured from GPnP for use as a public interface.
[name='eno1:1', type=1, ip=xx.xx.xx.124, mac=70-57-bf-39-1c-25, net=xx.xx.xx.0/24, mask=, use=public/1]
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
NUMA status: NUMA system w/ 4 process groups
cellaffinity.ora status: cannot find affinity map at '/etc/oracle/cell/network-config/cellaffinity.ora' (see trace file for details)
CELL communication will use 1 IP group(s):
Grp 0:
Picked latch-free SCN scheme 3
Mon Jul 13 11:06:10 2020
WARNING: db_recovery_file_dest is same as db_create_file_dest
Autotune of undo retention is turned on.
SYS auditing is disabled
NUMA system with 4 nodes detected
Starting up:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name: Linux
Node name: rac2
Release: 3.10.0-327.el7.x86_64
Version: #1 SMP Thu Oct 29 17:29:29 EDT 2015
Machine: x86_64
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/db_1/dbs/initrac1122.ora
System parameters with non-default values:
processes = 8192
sessions = 12384
spfile = "+DATA/rac112/spfilerac112.ora"
nls_language = "AMERICAN"
nls_territory = "CHINA"
sga_target = 450G
control_files = "+DATA/rac112/controlfile/current.261.1044461323"
control_files = "+DATA/rac112/controlfile/current.260.1044461323"
db_block_size = 8192
compatible = ""
log_archive_dest_1 = "location=+DATA/RAC112/DBFRA"
cluster_database = TRUE
db_create_file_dest = "+DATA"
db_recovery_file_dest = "+DATA"
db_recovery_file_dest_size= 440700M
thread = 2
undo_tablespace = "UNDOTBS2"
instance_number = 2
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=rac112XDB)"
remote_listener = "rac-scan:1521"
audit_file_dest = "/u01/app/oracle/admin/rac112/adump"
audit_trail = "DB"
db_name = "rac112"
open_cursors = 300
pga_aggregate_target = 115200M
diagnostic_dest = "/u01/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Mon Jul 13 11:06:12 2020
PMON started with pid=2, OS id=295770
Error occured while spawning process PMON; error = 27153
USER (ospid: 295705): terminating the instance due to error 27153
Instance terminated by USER, pid = 295705
[oracle@rac2 trace]$ oerr ora 27157
27157, 0000, "OS post/wait facility removed"
// *Cause: the post/wait facility for which the calling process is awaiting
// action is removed from the system
// *Action: check errno and contact Oracle Support
[oracle@rac2 trace]$ oerr ora 27300
27300, 00000, "OS system dependent operation:%s failed with status: %s"
// *Cause: OS system call error
// *Action: contact Oracle Support
百度了一下都说是max user process设置太小了。根据日志时间,当时确实修改了nproc参数:
grid soft nproc 4096
grid hard nproc 3088654
grid soft nofile 1024
grid hard nofile 65536
oracle soft nproc 4096
oracle hard nproc 3088654
oracle soft nofile 1024
oracle hard nofile 65536
grid soft nproc 9000
grid hard nproc 3088654
grid soft nofile 10240
grid hard nofile 655360
oracle soft nproc 9000
oracle hard nproc 3088654
oracle soft nofile 10240
oracle hard nofile 655360
使用ulimit -a查看已经是修改后的值了。
SQL> show parameter processes;
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 1
db_writer_processes integer 12
gcs_server_processes integer 5
global_txn_processes integer 1
job_queue_processes integer 1000
log_archive_max_processes integer 4
processes integer 8192
阅读(2049) | 评论(0) | 转发(0) |