2016-03-05 20:02:45.720 [ORAROOTAGENT(26842)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 26842
2016-03-05 20:02:53.572 [CLSECHO(27164)]CRS-10001: 05-Mar-16 20:02 AFD-9204: false
2016-03-05 20:03:45.967 [ORAROOTAGENT(26842)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:21:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:04:46.133 [ORAROOTAGENT(28430)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 28430
2016-03-05 20:05:01.921 [MDNSD(4300)]CRS-5602: mDNS service stopping by request.
2016-03-05 20:05:05.559 [MDNSD(4300)]CRS-8504: Oracle Clusterware MDNSD process with operating system process ID 4300 is exiting
2016-03-05 20:05:10.100 [OHASD(3131)]CRS-10301: Unable to contact Oracle high availability service on peer nodes for cluster wide commands. Details at (:CCL00001:) in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd.trc.
2016-03-05 20:05:46.196 [ORAROOTAGENT(28430)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:23:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:06:46.354 [ORAROOTAGENT(28952)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 28952
2016-03-05 20:07:46.412 [ORAROOTAGENT(28952)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:25:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:08:46.588 [ORAROOTAGENT(29487)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 29487
2016-03-05 20:09:46.644 [ORAROOTAGENT(29487)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:27:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:10:46.819 [ORAROOTAGENT(30584)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 30584
2016-03-05 20:10:46.857 [OCSSD(21147)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc
2016-03-05 20:10:46.857 [OCSSD(21147)]CRS-1603: CSSD on node tnc2 shutdown by user.
2016-03-05 20:10:46.856 [CSSDAGENT(21135)]CRS-5818: Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:15:27} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_cssdagent_root.trc.
2016-03-05 20:10:47.001 [ORAAGENT(4287)]CRS-5818: Aborted command 'clean' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:9:6} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_oraagent_oracle.trc.
2016-03-05 20:10:51.009 [ORAAGENT(4287)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_oraagent_oracle.trc"
2016-03-05 20:10:51.013 [OHASD(3131)]CRS-2757: Command 'Clean' timed out waiting for response from the resource 'ora.asm'. Details at (:CRSPE00163:) {0:9:6} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd.trc.
Sat Mar 05 20:10:51 2016
Errors in file /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc (incident=193):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/oracle/diag/crs/tnc2/crs/incident/incdir_193/ocssd_i193.trc
2016-03-05 20:10:52.062 [OCSSD(21147)]CRS-8503: Oracle Clusterware OCSSD process with operating system process ID 21147 experienced fatal signal or exception code 6
Sweep [inc][193]: completed
2016-03-05 20:10:57.452 [OHASD(3131)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.cssd'. Details at (:CRSPE00163:) {0:15:27} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd.trc.
2016-03-05 20:10:57.541 [CSSDAGENT(30935)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 30935
2016-03-05 20:10:57.696 [OHASD(3131)]CRS-2878: Failed to restart resource 'ora.asm'
2016-03-05 20:10:57.711 [OHASD(3131)]CRS-2878: Failed to restart resource 'ora.cssd'
2016-03-05 20:10:57.721 [GPNPD(4339)]CRS-2329: GPNPD on node tnc2 shut down.
2016-03-05 20:10:57.735 [GPNPD(4339)]CRS-8504: Oracle Clusterware GPNPD process with operating system process ID 4339 is exiting
2016-03-05 20:10:58.862 [GIPCD(4408)]CRS-8504: Oracle Clusterware GIPCD process with operating system process ID 4408 is exiting
2016-03-05 20:23:00.782 [CLSECHO(9793)]CRS-10001: 05-Mar-16 20:23 AFD-9204: false
2016-03-05 21:32:19.942 [CLSECHO(11011)]CRS-10001: 05-Mar-16 21:32 AFD-9204: false
$ tail -100f /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc
016-03-05 20:10:50.299585 : CSSD:1096018240: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1010 with cvtimewait status 4294967186
2016-03-05 20:10:50.401486 : CSSD:1121986880: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011537, LATS 2188456, lastSeqNo 1011534, uniqueness 1457225306, timestamp 1457226649/2189006
2016-03-05 20:10:50.401595 : CSSD:1118832960: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011538, LATS 2188456, lastSeqNo 1011535, uniqueness 1457225306, timestamp 1457226649/2189396
2016-03-05 20:10:50.416981 : CSSD:1085483328: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011539, LATS 2188466, lastSeqNo 1011536, uniqueness 1457225306, timestamp 1457226649/2189446
2016-03-05 20:10:51.303616 : CSSD:1096018240: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186
2016-03-05 20:10:51.351707 :GIPCHTHR:1114102080: gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30130loopCount 32
2016-03-05 20:10:51.404980 : CSSD:1121986880: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011540, LATS 2189456, lastSeqNo 1011537, uniqueness 1457225306, timestamp 1457226650/2190006
2016-03-05 20:10:51.405087 : CSSD:1118832960: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011541, LATS 2189456, lastSeqNo 1011538, uniqueness 1457225306, timestamp 1457226650/2190406
2016-03-05 20:10:51.420767 : CSSD:1085483328: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011542, LATS 2189476, lastSeqNo 1011539, uniqueness 1457225306, timestamp 1457226650/2190446
Trace file /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
DDE: Flood control is not active
Incident 193 created, dump file: /u01/app/oracle/diag/crs/tnc2/crs/incident/incdir_193/ocssd_i193.trc
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
CLSB:1106217280: Oracle Clusterware infrastructure error in OCSSD (OS PID 21147): Fatal signal 6 has occurred in program ocssd thread 1106217280; nested signal count is 1
Action:
Initially Analyze the alert log files and trace files
Start investigating on Failure Node and Living node in a cluster, Possible reasons for node eviction as follows
1) Check alert log file on Node 1, Some times If there is problem with Interconnect, Node 1 kicks out Node 2 from the cluster.
2) Blocking port can Make Node 2 Evicts from Cluster
Stop and Start CRS on the Failure Node will resolve the issue
--Nikhil Tatineni--
--12c RAC Cluster--
Action:
Initially Analyze the alert log files and trace files
Start investigating on Failure Node and Living node in a cluster, Possible reasons for node eviction as follows
1) Check alert log file on Node 1, Some times If there is problem with Interconnect, Node 1 kicks out Node 2 from the cluster.
2) Blocking port can Make Node 2 Evicts from Cluster
Stop and Start CRS on the Failure Node will resolve the issue
--Nikhil Tatineni--
--12c RAC Cluster--