这是做运维以来的第一篇日志。平时都是记录在笔记里,以后尝试记录在这里吧,做个整理效果会更好。
给自己定个小目标,以后一周更新两次吧~
我的环境是Redhat 7.2+ Oracle RAC 11204,本来系统已经运行了一段时间了,今天登陆无意间发现节点2的示例down了,而所有的 crs服务都很正常。
于是查看节点2的alert 日志:
vi /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/alert_rac1122.log
-
Mon Jul 13 11:05:48 2020
-
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_dbw4_28544.trc:
-
ORA-27157: OS post/wait facility removed
-
ORA-27300: OS system dependent operation:semop failed with status: 43
-
ORA-27301: OS failure message: Identifier removed
-
ORA-27302: failure occurred at: sskgpwwait1
-
Mon Jul 13 11:05:48 2020
-
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_o000_295057.trc:
-
ORA-27157: OS post/wait facility removed
-
ORA-27300: OS system dependent operation:semop failed with status: 43
-
ORA-27301: OS failure message: Identifier removed
-
ORA-27302: failure occurred at: sskgpwwait1
-
DBW4 (ospid: 28544): terminating the instance due to error 27157
-
Mon Jul 13 11:05:48 2020
-
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_j001_295484.trc:
-
ORA-27157: OS post/wait facility removed
-
ORA-27300: OS system dependent operation:semop failed with status: 43
-
ORA-27301: OS failure message: Identifier removed
-
ORA-27302: failure occurred at: sskgpwwait1
-
Mon Jul 13 11:05:48 2020
-
System state dump requested by (instance=2, osid=28544 (DBW4)), summary=[abnormal instance termination].
-
System State dumped to trace file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_diag_28495_20200713110548.trc
-
Dumping diagnostic data in directory=[cdmp_20200713110548], requested by (instance=2, osid=28544 (DBW4)), summary=[abnormal instance termination].
-
Instance terminated by DBW4, pid = 28544
-
Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_dbw4_28544.trc:
-
ORA-27300: OS system dependent operation:semctl failed with status: 22
-
ORA-27301: OS failure message: Invalid argument
-
ORA-27302: failure occurred at: sskgpwrm1
-
ORA-27157: OS post/wait facility removed
-
ORA-27300: OS system dependent operation:semop failed with status: 43
-
ORA-27301: OS failure message: Identifier removed
-
ORA-27302: failure occurred at: sskgpwwait1
-
Mon Jul 13 11:05:59 2020
-
Starting ORACLE instance (normal)
-
************************ Large Pages Information *******************
-
Per process system memlock (soft) limit = UNLIMITED
-
-
Total Shared Global Region in Large Pages = 0 KB (0%)
-
-
Large Pages used by this instance: 0 (0 KB)
-
Large Pages unused system wide = 0 (0 KB)
-
Large Pages configured system wide = 0 (0 KB)
-
Large Page size = 2048 KB
-
-
RECOMMENDATION:
-
Total System Global Area size is 450 GB. For optimal performance,
-
prior to the next instance restart:
-
1. Increase the number of unused large pages by
-
at least 230401 (page size 2048 KB, total size 450 GB) system wide to
-
get 100% of the System Global Area allocated with large pages
-
********************************************************************
-
LICENSE_MAX_SESSION = 0
-
LICENSE_SESSIONS_WARNING = 0
-
Initial number of CPU is 96
-
Number of processor cores in the system is 48
-
Number of processor sockets in the system is 4
-
Private Interface 'eno2:1' configured from GPnP for use as a private interconnect.
-
[name='eno2:1', type=1, ip=xx.xx.xx.155, mac=xxxxxxxx, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
-
Public Interface 'eno1' configured from GPnP for use as a public interface.
-
[name='eno1', type=1, ip=xx.xx.xx.122, mac=70-57-bf-39-1c-25, net=xx.xx.xx.0/24, mask=255.255.255.0, use=public/1]
-
Public Interface 'eno1:1' configured from GPnP for use as a public interface.
-
[name='eno1:1', type=1, ip=xx.xx.xx.124, mac=70-57-bf-39-1c-25, net=xx.xx.xx.0/24, mask=255.255.255.0, use=public/1]
-
CELL communication is configured to use 0 interface(s):
-
CELL IP affinity details:
-
NUMA status: NUMA system w/ 4 process groups
-
cellaffinity.ora status: cannot find affinity map at '/etc/oracle/cell/network-config/cellaffinity.ora' (see trace file for details)
-
CELL communication will use 1 IP group(s):
-
Grp 0:
-
Picked latch-free SCN scheme 3
-
Mon Jul 13 11:06:10 2020
-
WARNING: db_recovery_file_dest is same as db_create_file_dest
-
Autotune of undo retention is turned on.
-
LICENSE_MAX_USERS = 0
-
SYS auditing is disabled
-
NUMA system with 4 nodes detected
-
Starting up:
-
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
-
With the Partitioning, Real Application Clusters, OLAP, Data Mining
-
and Real Application Testing options.
-
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
-
System name: Linux
-
Node name: rac2
-
Release: 3.10.0-327.el7.x86_64
-
Version: #1 SMP Thu Oct 29 17:29:29 EDT 2015
-
Machine: x86_64
-
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/db_1/dbs/initrac1122.ora
-
System parameters with non-default values:
-
processes = 8192
-
sessions = 12384
-
spfile = "+DATA/rac112/spfilerac112.ora"
-
nls_language = "AMERICAN"
-
nls_territory = "CHINA"
-
sga_target = 450G
-
control_files = "+DATA/rac112/controlfile/current.261.1044461323"
-
control_files = "+DATA/rac112/controlfile/current.260.1044461323"
-
db_block_size = 8192
-
compatible = "11.2.0.4.0"
-
log_archive_dest_1 = "location=+DATA/RAC112/DBFRA"
-
cluster_database = TRUE
-
db_create_file_dest = "+DATA"
-
db_recovery_file_dest = "+DATA"
-
db_recovery_file_dest_size= 440700M
-
thread = 2
-
undo_tablespace = "UNDOTBS2"
-
instance_number = 2
-
remote_login_passwordfile= "EXCLUSIVE"
-
db_domain = ""
-
dispatchers = "(PROTOCOL=TCP) (SERVICE=rac112XDB)"
-
remote_listener = "rac-scan:1521"
-
audit_file_dest = "/u01/app/oracle/admin/rac112/adump"
-
audit_trail = "DB"
-
db_name = "rac112"
-
open_cursors = 300
-
pga_aggregate_target = 115200M
-
diagnostic_dest = "/u01/app/oracle"
-
Cluster communication is configured to use the following interface(s) for this instance
-
xx.xx.xx.155
-
cluster interconnect IPC version:Oracle UDP/IP (generic)
-
IPC Vendor 1 proto 2
-
Mon Jul 13 11:06:12 2020
-
PMON started with pid=2, OS id=295770
-
Error occured while spawning process PMON; error = 27153
-
USER (ospid: 295705): terminating the instance due to error 27153
-
Instance terminated by USER, pid = 295705
查看错误码,是操作系统内核参数的问题:
-
[oracle@rac2 trace]$ oerr ora 27157
-
27157, 0000, "OS post/wait facility removed"
-
// *Cause: the post/wait facility for which the calling process is awaiting
-
// action is removed from the system
-
// *Action: check errno and contact Oracle Support
-
[oracle@rac2 trace]$ oerr ora 27300
-
27300, 00000, "OS system dependent operation:%s failed with status: %s"
-
// *Cause: OS system call error
-
// *Action: contact Oracle Support
百度了一下都说是max user process设置太小了。根据日志时间,当时确实修改了nproc参数:
修改前:
-
grid soft nproc 4096
-
grid hard nproc 3088654
-
grid soft nofile 1024
-
grid hard nofile 65536
-
-
oracle soft nproc 4096
-
oracle hard nproc 3088654
-
oracle soft nofile 1024
-
oracle hard nofile 65536
修改后:
-
grid soft nproc 9000
-
grid hard nproc 3088654
-
grid soft nofile 10240
-
grid hard nofile 655360
-
-
oracle soft nproc 9000
-
oracle hard nproc 3088654
-
oracle soft nofile 10240
-
oracle hard nofile 655360
使用ulimit -a查看已经是修改后的值了。
因为Oracle设置的process是8192:
-
SQL> show parameter processes;
-
-
NAME TYPE VALUE
-
------------------------------------ ----------- ------------------------------
-
aq_tm_processes integer 1
-
db_writer_processes integer 12
-
gcs_server_processes integer 5
-
global_txn_processes integer 1
-
job_queue_processes integer 1000
-
log_archive_max_processes integer 4
-
processes integer 8192
猜想还是修改后没有生效的问题。重启服务器问题解决。
阅读(1960) | 评论(0) | 转发(0) |