Sunday, February 14, 2016

12c-RAC-CRS-4535

[root@tnc1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Checked CRS log on the Node 1 and found following info --- 


[oracle@tnc1 trace]$ pwd
/u01/app/oracle/diag/crs/tnc1/crs/trace

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
[oracle@tnc1 trace]$ ls -ltr alert.log 
-rw-rw---- 1 root oinstall 90611 Feb 13 23:18 alert.log

'''''''''''''''''''''''''''''''''''''''''''''''''''''


2016-02-13 19:45:50.055 [CLSECHO(10483)]CRS-10001: 13-Feb-16 19:45 AFD-9204: false
2016-02-13 19:59:42.749 [ORAAGENT(5595)]CRS-5011: Check of resource "p1tncd" failed: details at "(:CLSN00007:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/crsd_oraagent_oracle.trc"
2016-02-13 20:01:36.304 [ORAROOTAGENT(5604)]CRS-5017: The resource action "ora.net1.network start" encountered the following error: 
2016-02-13 20:01:36.304+CRS-5008: Invalid attribute value: eth0 for the network interface
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/crsd_orarootagent_root.trc".
2016-02-13 20:01:36.340 [CRSD(5467)]CRS-2878: Failed to restart resource 'ora.net1.network'
2016-02-13 20:01:36.358 [CRSD(5467)]CRS-2769: Unable to failover resource 'ora.net1.network'.
2016-02-13 20:01:37.056 [CRSD(5467)]CRS-2771: Maximum restart attempts reached for resource 'ora.tnc1.vip'; will not restart.
2016-02-13 20:01:37.218 [CRSD(5467)]CRS-2771: Maximum restart attempts reached for resource 'ora.scan1.vip'; will not restart.
2016-02-13 20:01:42.686 [ORAAGENT(5595)]CRS-5016: Process "/u01/app/12.1.0.2/grid/bin/lsnrctl" spawned by agent "ORAAGENT" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/crsd_oraagent_oracle.trc"
2016-02-13 20:25:49.009 [OCSSD(4377)]CRS-1612: Network communication with node tnc2 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.230 seconds
2016-02-13 20:25:51.325 [ORAAGENT(5595)]CRS-5011: Check of resource "p1tncd" failed: details at "(:CLSN00007:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/crsd_oraagent_oracle.trc"
2016-02-13 20:25:56.070 [OCSSD(4377)]CRS-1611: Network communication with node tnc2 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 7.170 seconds
2016-02-13 20:25:54.158 [ORAAGENT(3949)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/ohasd_oraagent_oracle.trc"
2016-02-13 20:26:00.852 [ORAAGENT(5595)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/crsd_oraagent_oracle.trc"
2016-02-13 20:26:01.112 [OCSSD(4377)]CRS-1610: Network communication with node tnc2 (2) missing for 90% of timeout interval.  Removal of this node from cluster in 2.120 seconds
2016-02-13 20:26:03.259 [OCSSD(4377)]CRS-1607: Node tnc2 is being evicted in cluster incarnation 350858546; details at (:CSSNM00007:) in /u01/app/oracle/diag/crs/tnc1/crs/trace/ocssd.trc.
Sat Feb 13 20:26:09 2016

Fatal NI connect error 12514, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=169.254.248.60)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=tnc_cluster)(CID=(PROGRAM=ologgerd)(HOST=tnc1.localdomain)(USER=root))))

  VERSION INFORMATION:
TNS for Linux: Version 12.1.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 12.1.0.2.0 - Production
  Time: 13-FEB-2016 20:26:09
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12564
    
TNS-12564: TNS:connection refused
    ns secondary err code: 0
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0

Fatal NI connect error 12514, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.180)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=tnc_cluster)(CID=(PROGRAM=ologgerd)(HOST=tnc1.localdomain)(USER=root))))

  VERSION INFORMATION:
TNS for Linux: Version 12.1.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 12.1.0.2.0 - Production
  Time: 13-FEB-2016 20:26:09
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12564
    
TNS-12564: TNS:connection refused
    ns secondary err code: 0
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
2016-02-13 20:26:10.976 [ORAAGENT(5595)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/crsd_oraagent_oracle.trc"
2016-02-13 20:26:11.262 [ORAROOTAGENT(4906)]CRS-5818: Aborted command 'check' for resource 'ora.drivers.acfs'. Details at (:CRSAGF00113:) {0:9:3} in /u01/app/oracle/diag/crs/tnc1/crs/trace/ohasd_orarootagent_root.trc.
2016-02-13 20:26:11.902 [ORAROOTAGENT(4906)]CRS-5014: Agent "ORAROOTAGENT" timed out starting process "/u01/app/12.1.0.2/grid/bin/acfsload" for action "check": details at "(:CLSN00009:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/ohasd_orarootagent_root.trc"
2016-02-13 20:26:14.393 [ORAAGENT(3949)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/u01/app/oracle/diag/crs/tnc1/crs/trace/ohasd_oraagent_oracle.trc"
2016-02-13 20:26:14.690 [CRSD(5467)]CRS-1024: The Cluster Ready Service on this node terminated because the ASM instance was not active on this node. Details at (:PROCR00009:) in /u01/app/oracle/diag/crs/tnc1/crs/trace/crsd.trc.


Rootcause: I found, when it is "ORAROOTAGENT" is trying to bringup network sources and it is failed on server & eventually Stopped CRS on the Node forcefully 

&  also I found one more error in the crs alert logfile 
TNS-12564: TNS:connection refused
    ns secondary err code: 0
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0

Resolution: As a root, stopped CRS Stack on Node 1 forcefully  
checked bash_profile and network configuration files to avoid "TNS-12564" errors 


[root@tnc1 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'tnc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'tnc1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'tnc1'
CRS-2677: Stop of 'ora.gpnpd' on 'tnc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'tnc1' succeeded

[root@tnc1 ~]# crsctl start crs

[oracle@tnc1 ~]$ crsctl status resource -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.FRA.dg
               ONLINE  ONLINE       tnc1                     STABLE
               ONLINE  ONLINE       tnc2                     STABLE
ora.GRID.dg
               ONLINE  ONLINE       tnc1                     STABLE
               ONLINE  ONLINE       tnc2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       tnc1                     STABLE
               ONLINE  ONLINE       tnc2                     STABLE
ora.SDXH.dg
               ONLINE  ONLINE       tnc1                     STABLE
               ONLINE  ONLINE       tnc2                     STABLE
ora.asm
               ONLINE  ONLINE       tnc1                     Started,STABLE
               ONLINE  ONLINE       tnc2                     Started,STABLE
ora.net1.network
               ONLINE  ONLINE       tnc1                     STABLE
               ONLINE  ONLINE       tnc2                     STABLE
ora.ons
               ONLINE  ONLINE       tnc1                     STABLE
               ONLINE  ONLINE       tnc2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       tnc1                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       tnc2                     169.254.89.38 10.10.
                                                                          10.190,STABLE
ora.cvu
      1        ONLINE  ONLINE       tnc2                     STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       tnc2                     Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       tnc2                     STABLE
ora.p1tncd.db
      1        ONLINE  ONLINE       tnc1                     Open,STABLE
      2        ONLINE  ONLINE       tnc2                     Open,STABLE
ora.p1tncd.srvnsp.svc
      1        ONLINE  ONLINE       tnc1                     STABLE
      2        OFFLINE OFFLINE                                 STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       tnc1                     STABLE
ora.tnc1.vip
      1        ONLINE  ONLINE       tnc1                     STABLE
ora.tnc2.vip
      1        ONLINE  ONLINE       tnc2                     STABLE
--------------------------------------------------------------------------------

--NIKHIL TATINENI--
--12C Oracle RAC-- 

Querys to monitor RAC

following few  Query's will help to find out culprits-  Query to check long running transaction from last 8 hours  Col Sid Fo...