Sunday, November 1, 2015

Oracle GoldenGate: General Monitoring


we use goldengate to replicate data at transaction level :) 

Goldengate manager: 
When we configure goldengate replication for the table, initially we add supplemental logging for the table. we can add supplemental logging at database level, schema level and Table level. ok how do add  and check supplemental logging for table?

In order to check supplemental logging for the table, we have to perform dblogin using goldengate user and password as follows

GGSCI (tnc61.ffdc.com) 31> dblogin userid ggsuser, password oracle
Successfully logged into database.

GGSCI (tnc61.ffdc.com as ggsuser@oval007p) 32> info trandata world.account                 
Logging of supplemental redo log data is enabled for table NSPDBA.ACCOUNT.
Columns supplementally logged for table world.account: ACCOUNT_ADDRESS, ACCOUNT_ID, ACCOUNT_NAME, ACCOUNT_TEL, BALANCE
Monitoring manager:
Manager runs on goldengate home and on default port number 7809 :)
we start manager and stop manager as follows 
GGSCI (tnc61.ffdc.com) 3> stop mgr !
Sending STOP request to MANAGER ...
Request processed.

GGSCI (tnc61.ffdc.com) 4> start mgr !
Manager started.

Manager runs on goldengate home on default port number 7809 '
GGSCI (tnc61.ffdc.com) 5> info mgr 
Manager is running (IP port tnc61.ffdc.com.7809, Process ID 3905)


GGSCI (tnc61.ffdc.com) 13> info mgr childstatus
Manager is running (IP port tnc61.ffdc.com.7809, Process ID 4035).

4 status of the goldengate process:starting, running, stopped and abended 

info all command tells about the status of the process running on the goldengate home


GGSCI (tnc61.ffdc.com) 12> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
EXTRACT     RUNNING     ENTNEW      00:00:02      00:00:02    ""
EXTRACT     RUNNING     EXTORD7     00:00:02      00:00:07    """"
EXTRACT     STOPPED     PEXTORD7    00:00:00      5257:36:07  """"
EXTRACT     STOPPED     PXTNEW      00:00:00      5257:36:08 
REPLICAT    STOPPED     REORD117    00:00:00      5257:38:02  '''

When ever any goldengate is started, stopped,  abended it creates report file in "dirprm" directory under goldengate home.  If any process is abended in goldengate, the root cause can be found in reportfile and ggserror.log 

for example :  Extract is abended 

GGSCI (tnc61.ffdc.com) 12> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
EXTRACT     ABENDED     ENTNEW      00:00:02      00:46:02    

we can view report file from goldengate command line utility as follows

GGSCI (tnc61.ffdc.com) 13> view report ENTNEW
***********************************************************************
                 Oracle GoldenGate Capture for Oracle
   Version 12.1.2.1.0 OGGCORE_12.1.2.1.0_PLATFORMS_140727.2135.1_FBO
   Linux, x64, 64bit (optimized), Oracle 12c on Aug  7 2014 10:40:21
Copyright (C) 1995, 2014, Oracle and/or its affiliates. All rights reserved.
                    Starting at 2015-11-01 05:44:28
***********************************************************************
Operating System Version:
Linux
Version #1 SMP Sat Feb 25 12:50:01 EST 2012, Release 2.6.18-308.el5xen
Node: tnc61.ffdc.com
Machine: x86_64
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :    unlimited    unlimited
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited
Process id: 4456
Description: 
***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
2015-11-01 05:44:28  INFO    OGG-03059  Operating system character set identified as UTF-8.
2015-11-01 05:44:28  INFO    OGG-02695  ANSI SQL parameter syntax is used for parameter parsing.
extract ENTNEW
SETENV (ORACLE_SID=mis007p)
2015-11-01 05:44:28  INFO    OGG-02095  Successfully set environment variable ORACLE_SID=mis007p.
SETENV (ORACLE_HOME="/u01/app/oracle/product/12.1.0/dbhome_1")
2015-11-01 05:44:28  INFO    OGG-02095  Successfully set environment variable ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1.
userid ggsuser, password ******
Source Context :
  SourceModule            : [ggdb.ora.sess]
  SourceID                : [/scratch/aime1/adestore/views/aime1_adc4150384/oggcore/OpenSys/src/gglib/ggdbora/ocisess.c]
  SourceFunction          : [oci_try]
  SourceLine              : [816]
  ThreadBacktrace         : [17] elements
                          : [/u01/app/goldengate/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x2b2c93978dae]                   : [/u01/app/goldengate/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x
340) [0x2b2c93973ac0]]                        : [/u01/app/goldengate/libgglog.so(_MSG_ERR_ORACLE_OCI_ERROR_WITH_DESC(CSourceContext*, int, char const*
, char const*, CMessageFactory::MessageDisposition)+0x41) [0x2b2c9395c9bf]]
                          : [/u01/app/goldengate/extract(OCISESS_context_def::oci_try(int, char const*, ...)+0x4e2) [0x6fc8b2]]              : [/u01/app/goldengate/extract(OCISESS_context_def::logon(ggs::gglib::ggapp::CLoginName const&, ggs::ggl
ib::ggapp::CDBObjName<(DBObjType)12> const&, ggs::gglib::ggunicode::UString const&, int, bool, int)+0x724) [0x6f9614]]                    : [/u01/app/goldengate/extract(DBOCI_init_connection_logon(ggs::gglib::ggapp::CLoginName const&, ggs::gg
lib::ggapp::CDBObjName<(DBObjType)12> const&, char const*, int, int, int, char*)+0x83) [0x6be633]                    : [/u01/app/goldengate/extract [0x6ef4dd]]
                          : [/u01/app/goldengate/extract [0x6ef6c3]]
                          : [/u01/app/goldengate/extract(gl_odbc_param(char const*, char const*, char (&) [2048])+0x5a) [0x6ef9fa]
                          : [/u01/app/goldengate/extract(get_infile_params(time_elt_def*, time_elt_def*, char**)+0x10a5) [0x59a175
                          : [/u01/app/goldengate/extract(extract_main(int, char**)+0x109) [0x5bfd89]]                    : [/u01/app/goldengate/extract(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x4f) [0x6ab4df]]                         : [/u01/app/goldengate/extract(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading:
:Thread::ThreadArgs*)+0x104) [0x6ab9e4]]
                          : [/u01/app/goldengate/extract(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0x8b) [0x6abdab]
                          : [/u01/app/goldengate/extract(main+0x3f) [0x5bfa2f]]
                          : [/lib64/libc.so.6(__libc_start_main+0xf4) [0x307221d994]]
                          : [/u01/app/goldengate/extract [0x534319]]
2015-11-01 05:44:28  ERROR   OGG-00664  OCI Error beginning session (status = 28001-ORA-28001: the password has expired).
2015-11-01 05:44:28  ERROR   OGG-01668  PROCESS ABENDING.

from the above report file, we found that goldengate process is abended with ERROR 0GG-00664

ggserror.log exists in goldengate home 
from here you can found errors in the alert log file 
GGSCI (tnc61.ffdc.com) 14> view ggsevt

Monitoring Extract:
Extract process runs on goldengate home, it works on online redologs and open the trails and flush the data into the local trail directory by using checkpoints. Checkpoint information about the extract can be known by issuing following command 


GGSCI (tnc61.ffdc.com) 16> info ENTNEW, showch
EXTRACT    ENTNEW    Last Started 2015-11-01 05:46   Status RUNNING
Checkpoint Lag       00:00:01 (updated 00:00:01 ago)
Process ID           4521
Log Read Checkpoint  Oracle Integrated Redo Logs
                     2015-11-01 06:30:57
                     SCN 0.2983087 (2983087)

Current Checkpoint Detail:
Read Checkpoint #1
  Oracle Integrated Redo Log
  Startup Checkpoint (starting position in the data source):
    Timestamp: 2015-03-19 07:42:58.000000
    SCN: Not available
  Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
    Timestamp: 2015-11-01 06:30:57.000000
    SCN: 0.2983086 (2983086)
  Current Checkpoint (position of last record read in the data source):
    Timestamp: 2015-11-01 06:30:57.000000
    SCN: 0.2983087 (2983087)
  BR Previous Recovery Checkpoint:
    Timestamp: 2015-03-26 18:47:36.861436
    SCN: Not available
  BR Begin Recovery Checkpoint:
    Timestamp: 2015-03-27 02:47:52.000000
    SCN: 0.2315296 (2315296)
  BR End Recovery Checkpoint:
    Timestamp: 2015-03-27 02:47:52.000000
    SCN: 0.2315296 (2315296)
Write Checkpoint #1
  GGS Log Trail
  Current Checkpoint (current write position):
    Sequence #: 7
    RBA: 1470
    Timestamp: 2015-11-01 06:30:58.940110
    Extract Trail: /u01/app/trails/ba
    Trail Type: EXTTRAIL
Header:
  Version = 2
  Record Source = A
  Type = 13
  # Input Checkpoints = 1
  # Output Checkpoints = 1
File Information:
  Block Size = 2048
  Max Blocks = 100
  Record Length = 2048
  Current Offset = 0
Configuration:
  Data Source = 3
  Transaction Integrity = 1
  Task Type = 0
Status:
  Start Time = 2015-11-01 05:46:20
  Last Update Time = 2015-11-01 06:30:58
  Stop Status = A

  Last Result = 0

Read checkpoint from online redo logs and write checkpoint on goldengate trail :) 

When ever long running transaction is started on database, extracts start and opens the trails in local trail directory. When this transaction is running for longer time, we can watch lag info at column "Time Since Chkpt" 
for example: 

GGSCI (tnc61.ffdc.com) 12> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
EXTRACT     RUNNING     ENTNEW      00:00:02      01:34:02    ""

This is time since extract collected last transaction on database 

Monitoring pump: 
Pump is configured in the scenario, where we need to broadcast data between instances or low network bandwidth  where we replicate data between data centers

example of  pump param file 
GGSCI (tnc61.ffdc.com) 19> view param PEXTORD7
extract pextord7 
PASSTHRU
RMTHOST tnc61.ffdc.com, MGRPORT 7809
RMTTRAIL /u01/app/trails/aa
REPORTCOUNT EVERY 10 MINUTES, RATE

TABLE world.orders;

Functionality of the pump: pump copy local trail from local directory to the remote trail directory. Here pump contacts the manager ( running on target goldengate home) on target server. Manager forward the request to its child process collector. collector allows pump to flush the trail into the remote trail directory 

General issues which can obstruct functionality of the pump
--if the manager is down (or) remote trail directory is full, pumps copying data to remote trail server will get abended
--we see data inconsistency issues when pump is unable to perform ETROLLOVER automatically and struck end of the trail
--when ever source server, went down or for server reboot. manager of target server will place lock on the remote trail. placing lock on remote trail make pump unable to flush the data into the same trail with same seqno and pump will get abended 

Note 1--
When you perform ETROLLOVER of the pump make sure the RBA is equal to the size of trail 
Example: 

GGSCI (tnc61.ffdc.com) 20> info PEXTORD7
EXTRACT    PEXTORD7  Last Started 2015-03-26 18:47   Status STOPPED
Checkpoint Lag       00:00:00 (updated 5258:54:05 ago)
Log Read Checkpoint  File /u01/app/trails/aa000008
                     2015-03-26 18:47:54.581514  RBA 1473

GGSCI (tnc61.ffdc.com) 24> sh ls -ltr /u01/app/trails/aa*     
-rw-r----- 1 oracle dba 1473 Oct  4 03:18 /u01/app/trails/aa000008
-rw-r----- 1 oracle dba 1473 Nov  1 05:46 /u01/app/trails/aa000009
-rw-r----- 1 oracle dba 1473 Nov  1 05:46 /u01/app/trails/aa000010

From above we can confirm that size of the trail is equal to size of trail  and we can perform ETROLLOVER 

Note2 -- 
Do you want to check RMTTRAIL Seqno and RBA of the pump? we can issue following command and check which trail seqno and RBA pump is writing to remote trail directory 

GGSCI (tnc61.ffdc.com) 27> info rmttrail /u01/app/trails/aa

       Extract Trail: /u01/app/trails/aa
             Extract: EXTORD7
               Seqno: 10
                 RBA: 1473
           File Size: 100M

       Extract Trail: /u01/app/trails/aa
             Extract: PEXTORD7
               Seqno: 4
                 RBA: 1530
           File Size: 100M

From the above command we found, Pump PEXTORD7 is writing to trail "/u01/app/trails/aa" and seqno 4 and RBA 1530 

Monitoring Replicat: 
Monitoring replicat is very crucial job in the environment. You need guts to challenge to tune query's made by sql on database. Make sure you are good at reading the trail data using logdump utility. 
--Note : if the query's made by the sql is taking longer time on database, this query's can consume more cpu and elapsed time and can degrade performance of database 

Functionality of the Replicat: 
Replicat run on target database on goldengate home and applies data from remote trail directory to target database. Replicat  will make consistent sql depend on data it have on trail :) 

GGSCI (tnc61.ffdc.com as ggsuser@dbl0p) 36> info REORD117
REPLICAT   REORD117  Last Started 2015-11-01 07:33   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:03 ago)
Process ID           6112
Log Read Checkpoint  File /u01/app/goldengate/ab000000
                     First Record  RBA 0

GGSCI (tnc61.ffdc.com as ggsuser@dbl0p) 36> info REORD117
REPLICAT   REORD117  Last Started 2015-11-01 07:33   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:03 ago)
Process ID           6112
Log Read Checkpoint  File /u01/app/goldengate/ab000000
                     First Record  RBA 1420

If you can watch, Watch RBA is increasing, indicates replicat is moving, not only for replicat, you can watch same behavior for the pump


GGSCI (tnc61.ffdc.com) 29> info all

Program         Status                Group      Lag at Chkpt     Time Since Chkpt

REPLICAT    RUNNING     REORD117    00:00:00              00:10:31  

if you look , Column "Time since Chkpt" is increasing
Replicat applies transaction on database and updates check point table 
If the replicat is taking longer time to apply the sql's on database it stops applying to check point table. the time "Time Since Chkpt" column will get increased. This behavior indicates, there is long running transaction  on database or some statement running on database are blocking DML made by replicat (or) there is problem with sql made by the replicat on  target database. 

---Nikhil Tatineni---
---Oracle in memory---





Querys to monitor RAC

following few  Query's will help to find out culprits-  Query to check long running transaction from last 8 hours  Col Sid Fo...