Friday, February 19, 2016

Oracle RAC: Diagwait

Stabilize RAC environment by setting "diagwait" to get information about Oracle Cluster-ware problems in logs

Node Evicts from Cluster, sometimes it don't have enough time to write enough error information into the logs. When ever CPU is overloaded, Demons will flush the error messages in logs/traces. When we increase diagwait, clusterware will wait for additional time to write information to log/traces 

NOTE: From (Doc ID 559365.1)
Starting with 11.2.0.1, 
Customers do not need to set diagwait as the architecture has been changed.

In order to set diagwait, we need to clusterware stack on all servers 

[root@tnc1 ~]# crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'tnc1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'tnc1'
CRS-2673: Attempting to stop 'ora.oc4j' on 'tnc1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'tnc1'
CRS-2673: Attempting to stop 'ora.mgmtdb' on 'tnc1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.p1tncd.srvnsp.svc' on 'tnc1'
CRS-2677: Stop of 'ora.p1tncd.srvnsp.svc' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.tnc1.vip' on 'tnc1'
CRS-2673: Attempting to stop 'ora.p1tncd.db' on 'tnc1'
CRS-2673: Attempting to stop 'ora.crsd' on 'tnc2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'tnc2'
CRS-2673: Attempting to stop 'ora.cvu' on 'tnc2'
CRS-2677: Stop of 'ora.cvu' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.tnc1.vip' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.p1tncd.db' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'tnc1'
CRS-2673: Attempting to stop 'ora.SDXH.dg' on 'tnc1'
CRS-2677: Stop of 'ora.FRA.dg' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.SDXH.dg' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.mgmtdb' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.GRID.dg' on 'tnc1'
CRS-2673: Attempting to stop 'ora.MGMTLSNR' on 'tnc1'
CRS-2677: Stop of 'ora.MGMTLSNR' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.GRID.dg' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'tnc1'
CRS-2673: Attempting to stop 'ora.GRID.dg' on 'tnc2'
CRS-2673: Attempting to stop 'ora.p1tncd.srvnsp.svc' on 'tnc2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'tnc2'
CRS-2677: Stop of 'ora.asm' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'tnc2'
CRS-2677: Stop of 'ora.p1tncd.srvnsp.svc' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.p1tncd.db' on 'tnc2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'tnc2'
CRS-2677: Stop of 'ora.scan1.vip' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.tnc2.vip' on 'tnc2'
CRS-2677: Stop of 'ora.GRID.dg' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.p1tncd.db' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'tnc2'
CRS-2673: Attempting to stop 'ora.SDXH.dg' on 'tnc2'
CRS-2677: Stop of 'ora.tnc2.vip' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.SDXH.dg' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'tnc2'
CRS-2677: Stop of 'ora.asm' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'tnc2'
CRS-2677: Stop of 'ora.ons' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'tnc2'
CRS-2677: Stop of 'ora.net1.network' on 'tnc2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'tnc2' has completed
CRS-2677: Stop of 'ora.crsd' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'tnc2'
CRS-2673: Attempting to stop 'ora.evmd' on 'tnc2'
CRS-2673: Attempting to stop 'ora.storage' on 'tnc2'
CRS-2677: Stop of 'ora.storage' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'tnc2'
CRS-2677: Stop of 'ora.evmd' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.oc4j' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'tnc1'
CRS-2677: Stop of 'ora.net1.network' on 'tnc1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'tnc1' has completed
CRS-2677: Stop of 'ora.crsd' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'tnc1'
CRS-2673: Attempting to stop 'ora.evmd' on 'tnc1'
CRS-2673: Attempting to stop 'ora.storage' on 'tnc1'
CRS-2677: Stop of 'ora.storage' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'tnc1'
CRS-2677: Stop of 'ora.evmd' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'tnc2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'tnc2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'tnc2'
CRS-2677: Stop of 'ora.cssd' on 'tnc2' succeeded
CRS-2677: Stop of 'ora.asm' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'tnc1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'tnc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'tnc1'
CRS-2677: Stop of 'ora.cssd' on 'tnc1' succeeded

[oracle@tnc2 ~]$ crsctl check cluster -all
**************************************************************
tnc1:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************
tnc2:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************

[oracle@tnc2 ~]$ ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
oracle   11706  6825  0 14:02 pts/2    00:00:00 egrep crsd.bin|ocssd.bin|evmd.bin|oprocd

After making CRS Stack is down change diagwait to desired value. As per Oracle Documentation set to 13 as "root" user from any one node of the cluster  as follows
#crsctl set css diagwait 13 -force

Start Cluster after configuring 
#crsctl start cluster -all

Get value of Diagwait
#crsctl get css diagwait

--Nikhil Tatineni--
--RAC Cluster--

Querys to monitor RAC

following few  Query's will help to find out culprits-  Query to check long running transaction from last 8 hours  Col Sid Fo...