Saturday, March 5, 2016

CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []

2016-03-05 20:02:45.720 [ORAROOTAGENT(26842)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 26842
2016-03-05 20:02:53.572 [CLSECHO(27164)]CRS-10001: 05-Mar-16 20:02 AFD-9204: false
2016-03-05 20:03:45.967 [ORAROOTAGENT(26842)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:21:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:04:46.133 [ORAROOTAGENT(28430)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 28430
2016-03-05 20:05:01.921 [MDNSD(4300)]CRS-5602: mDNS service stopping by request.
2016-03-05 20:05:05.559 [MDNSD(4300)]CRS-8504: Oracle Clusterware MDNSD process with operating system process ID 4300 is exiting
2016-03-05 20:05:10.100 [OHASD(3131)]CRS-10301: Unable to contact Oracle high availability service on peer nodes for cluster wide commands. Details at (:CCL00001:) in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd.trc.
2016-03-05 20:05:46.196 [ORAROOTAGENT(28430)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:23:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:06:46.354 [ORAROOTAGENT(28952)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 28952
2016-03-05 20:07:46.412 [ORAROOTAGENT(28952)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:25:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:08:46.588 [ORAROOTAGENT(29487)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 29487
2016-03-05 20:09:46.644 [ORAROOTAGENT(29487)]CRS-5818: Aborted command 'check' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:27:2} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_orarootagent_root.trc.
2016-03-05 20:10:46.819 [ORAROOTAGENT(30584)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 30584
2016-03-05 20:10:46.857 [OCSSD(21147)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc
2016-03-05 20:10:46.857 [OCSSD(21147)]CRS-1603: CSSD on node tnc2 shutdown by user.
2016-03-05 20:10:46.856 [CSSDAGENT(21135)]CRS-5818: Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:15:27} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_cssdagent_root.trc.
2016-03-05 20:10:47.001 [ORAAGENT(4287)]CRS-5818: Aborted command 'clean' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:9:6} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_oraagent_oracle.trc.
2016-03-05 20:10:51.009 [ORAAGENT(4287)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd_oraagent_oracle.trc"
2016-03-05 20:10:51.013 [OHASD(3131)]CRS-2757: Command 'Clean' timed out waiting for response from the resource 'ora.asm'. Details at (:CRSPE00163:) {0:9:6} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd.trc.
Sat Mar 05 20:10:51 2016
Errors in file /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc  (incident=193):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/oracle/diag/crs/tnc2/crs/incident/incdir_193/ocssd_i193.trc

2016-03-05 20:10:52.062 [OCSSD(21147)]CRS-8503: Oracle Clusterware OCSSD process with operating system process ID 21147 experienced fatal signal or exception code 6
Sweep [inc][193]: completed
2016-03-05 20:10:57.452 [OHASD(3131)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.cssd'. Details at (:CRSPE00163:) {0:15:27} in /u01/app/oracle/diag/crs/tnc2/crs/trace/ohasd.trc.
2016-03-05 20:10:57.541 [CSSDAGENT(30935)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 30935
2016-03-05 20:10:57.696 [OHASD(3131)]CRS-2878: Failed to restart resource 'ora.asm'
2016-03-05 20:10:57.711 [OHASD(3131)]CRS-2878: Failed to restart resource 'ora.cssd'
2016-03-05 20:10:57.721 [GPNPD(4339)]CRS-2329: GPNPD on node tnc2 shut down. 
2016-03-05 20:10:57.735 [GPNPD(4339)]CRS-8504: Oracle Clusterware GPNPD process with operating system process ID 4339 is exiting
2016-03-05 20:10:58.862 [GIPCD(4408)]CRS-8504: Oracle Clusterware GIPCD process with operating system process ID 4408 is exiting
2016-03-05 20:23:00.782 [CLSECHO(9793)]CRS-10001: 05-Mar-16 20:23 AFD-9204: false
2016-03-05 21:32:19.942 [CLSECHO(11011)]CRS-10001: 05-Mar-16 21:32 AFD-9204: false

$ tail -100f  /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc
016-03-05 20:10:50.299585 :    CSSD:1096018240: clssscWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 1010 with cvtimewait status 4294967186
2016-03-05 20:10:50.401486 :    CSSD:1121986880: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011537, LATS 2188456, lastSeqNo 1011534, uniqueness 1457225306, timestamp 1457226649/2189006
2016-03-05 20:10:50.401595 :    CSSD:1118832960: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011538, LATS 2188456, lastSeqNo 1011535, uniqueness 1457225306, timestamp 1457226649/2189396
2016-03-05 20:10:50.416981 :    CSSD:1085483328: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011539, LATS 2188466, lastSeqNo 1011536, uniqueness 1457225306, timestamp 1457226649/2189446
2016-03-05 20:10:51.303616 :    CSSD:1096018240: clssscWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 1000 with cvtimewait status 4294967186
2016-03-05 20:10:51.351707 :GIPCHTHR:1114102080: gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30130loopCount 32
2016-03-05 20:10:51.404980 :    CSSD:1121986880: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011540, LATS 2189456, lastSeqNo 1011537, uniqueness 1457225306, timestamp 1457226650/2190006
2016-03-05 20:10:51.405087 :    CSSD:1118832960: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011541, LATS 2189456, lastSeqNo 1011538, uniqueness 1457225306, timestamp 1457226650/2190406
2016-03-05 20:10:51.420767 :    CSSD:1085483328: clssnmvDHBValidateNCopy: node 1, tnc1, has a disk HB, but no network HB, DHB has rcfg 352673332, wrtcnt, 1011542, LATS 2189476, lastSeqNo 1011539, uniqueness 1457225306, timestamp 1457226650/2190446
Trace file /u01/app/oracle/diag/crs/tnc2/crs/trace/ocssd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
DDE: Flood control is not active
Incident 193 created, dump file: /u01/app/oracle/diag/crs/tnc2/crs/incident/incdir_193/ocssd_i193.trc
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
    CLSB:1106217280: Oracle Clusterware infrastructure error in OCSSD (OS PID 21147): Fatal signal 6 has occurred in program ocssd thread 1106217280; nested signal count is 1

Action:

Initially Analyze the alert log files and trace files

Start investigating on Failure Node and Living node in a cluster, Possible reasons for node eviction as follows 

1) Check alert log file on Node 1,  Some times If there is problem with Interconnect, Node 1 kicks out Node 2 from the cluster.
2) Blocking port can Make Node 2 Evicts from Cluster
Stop and Start CRS on the Failure Node will resolve the issue

--Nikhil Tatineni--
--12c RAC Cluster--

Querys to monitor RAC

following few  Query's will help to find out culprits-  Query to check long running transaction from last 8 hours  Col Sid Fo...