Monday, July 18, 2016

Clusterware startup sequence RAC



Clusterware startup sequence
when you start cluster ware, operating system demon "init" start Clusterware startup script in /etc/inittab file. 
During installation, Oracle add clusterware startup script in "/etc/inittab" file. 
When node comes up during server reboot init process start Oracle high availability using /etc/inittab
cat /etc/inittab
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

 If ohasd crashed on the node. os demon "init" server process, starts the "ohasd" demon
"ohasd" demon responsible and starts the rest of clusterware stack 

ohasd starts 
-> CssdAgent
-> OraRootAgent
-> OraAgent
->  CssdMonitor



i).Cluster Ready Services (CRS)
$ ps -ef | grep crs | grep -v grep
root 25863 1 1 Oct27 ? 11:37:32 /opt/oracle/grid/product/11.2.0/bin/crsd.bin reboot
crsd.bin => The above process is responsible for start, stop, monitor and failover of resource. It maintains OCR and also restarts the resources when the failure occurs.
This is applicable for RAC systems. For Oracle Restart and ASM ohasd is used.

ii).Cluster Synchronization Service (CSS)
$ ps -ef | grep -v grep | grep css
root 19541 1 0 Oct27 ? 00:05:55 /opt/oracle/grid/product/11.2.0/bin/cssdmonitor
root 19558 1 0 Oct27 ? 00:05:45 /opt/oracle/grid/product/11.2.0/bin/cssdagent
oragrid 19576 1 6 Oct27 ? 2-19:13:56 /opt/oracle/grid/product/11.2.0/bin/ocssd.bin
cssdmonitor => Monitors node hangs(via oprocd functionality) and monitors OCCSD process hangs (via oclsomon functionality) and monitors vendor clusterware(via vmon functionality).This is the multi threaded process that runs with elavated priority.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdmonitor 

cssdagent => Spawned by OHASD process.Previously(10g) oprocd, responsible for I/O fencing.Killing this process would cause node reboot.Stops,start checks the status of occsd.bin daemon
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent 

occsd.bin => Manages cluster node membership runs as oragrid user.Failure of this process results in node restart.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent --> ocssd --> ocssd.bin 

iii) Event Management (EVM)
$ ps -ef | grep evm | grep -v grep
oragrid 24623 1 0 Oct27 ? 00:30:25 /opt/oracle/grid/product/11.2.0/bin/evmd.bin
oragrid 25934 24623 0 Oct27 ? 00:00:00 /opt/oracle/grid/product/11.2.0/bin/evmlogger.bin -o /opt/oracle/grid/product/11.2.0/evm/log/evmlogger.info -l /opt/oracle/grid/product/11.2.0/evm/log/evmlogger.log
evmd.bin => Distributes and communicates some cluster events to all of the cluster members so that they are aware of the cluster changes.
evmlogger.bin => Started by EVMD.bin reads the configuration files and determines what events to subscribe to from EVMD and it runs user defined actions for those events.

iv).Oracle Root Agent
$ ps -ef | grep -v grep | grep orarootagent
root 19395 1 0 Oct17 ? 12:06:57 /opt/oracle/grid/product/11.2.0/bin/orarootagent.bin
root 25853 1 1 Oct17 ? 16:30:45 /opt/oracle/grid/product/11.2.0/bin/orarootagent.bin
orarootagent.bin => A specialized oraagent process that helps crsd manages resources owned by root, such as the network, and the Grid virtual IP address.
The above 2 process are actually threads which looks like processes. This is a Linux specific

v).Cluster Time Synchronization Service (CTSS)
$ ps -ef | grep ctss | grep -v grep
root 24600 1 0 Oct27 ? 00:38:10 /opt/oracle/grid/product/11.2.0/bin/octssd.bin reboot
octssd.bin => Provides Time Management in a cluster for Oracle Clusterware

vi).Oracle Agent
$ ps -ef | grep -v grep | grep oraagent
oragrid 5337 1 0 Nov14 ? 00:35:47 /opt/oracle/grid/product/11.2.0/bin/oraagent.bin
oracle 8886 1 1 10:25 ? 00:00:05 /opt/oracle/grid/product/11.2.0/bin/oraagent.bin
oragrid 19481 1 0 Oct27 ? 01:45:19 /opt/oracle/grid/product/11.2.0/bin/oraagent.bin
oraagent.bin => Extends clusterware to support Oracle-specific requirements and complex resources. This process runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1). 

ORACLE HIGH AVAILABILITY SERVICES STACK
i) Cluster Logger Service
$ ps -ef | grep -v grep | grep ologgerd
root 24856 1 0 Oct27 ? 01:43:48 /opt/oracle/grid/product/11.2.0/bin/ologgerd -m mg5hfmr02a -r -d /opt/oracle/grid/product/11.2.0/crf/db/mg5hfmr01a
ologgerd => Receives information from all the nodes in the cluster and persists in a CHM repository-based database. This service runs on only two nodes in a cluster 

ii).System Monitor Service (osysmond)
$ ps -ef | grep -v grep | grep osysmond
root 19528 1 0 Oct27 ? 09:42:16 /opt/oracle/grid/product/11.2.0/bin/osysmond
osysmond => The monitoring and operating system metric collection service that sends the data to the cluster logger service. This service runs on every node in a cluster

iii). Grid Plug and Play (GPNPD):
$ ps -ef | grep gpn
oragrid 19502 1 0 Oct27 ? 00:21:13 /opt/oracle/grid/product/11.2.0/bin/gpnpd.bin
gpnpd.bin => Provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile.

iv).Grid Interprocess Communication (GIPC):
$ ps -ef | grep -v grep | grep gipc
oragrid 19516 1 0 Oct27 ? 01:51:41 /opt/oracle/grid/product/11.2.0/bin/gipcd.bin
gipcd.bin => A support daemon that enables Redundant Interconnect Usage.

v). Multicast Domain Name Service (mDNS):
$ ps -ef | grep -v grep | grep dns
oragrid 19493 1 0 Oct27 ? 00:01:18 /opt/oracle/grid/product/11.2.0/bin/mdnsd.bin
mdnsd.bin => Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX and on Windows.

vi).Oracle Grid Naming Service (GNS)
$ ps -ef | grep -v grep | grep gns
gnsd.bin => Handles requests sent by external DNS servers, performing name resolution for names defined by the cluster
--Nikhil Tatineni--
--11GR2 RAC-- 
Reference - https://blogs.oracle.com/myoraclediary/entry/clusterware_processes_in_11g_rac

Querys to monitor RAC

following few  Query's will help to find out culprits-  Query to check long running transaction from last 8 hours  Col Sid Fo...