Database In Memory : Oracle GoldenGate: General Monitoring

we use goldengate to replicate data at transaction level :)

Goldengate manager:
When we configure goldengate replication for the table, initially we add supplemental logging for the table. we can add supplemental logging at database level, schema level and Table level. ok how do add and check supplemental logging for table?

In order to check supplemental logging for the table, we have to perform dblogin using goldengate user and password as follows

GGSCI (tnc61.ffdc.com) 31> dblogin userid ggsuser, password oracle

Successfully logged into database.

GGSCI (tnc61.ffdc.com as ggsuser@oval007p) 32> info trandata world.account

Logging of supplemental redo log data is enabled for table NSPDBA.ACCOUNT.

Columns supplementally logged for table world.account: ACCOUNT_ADDRESS, ACCOUNT_ID, ACCOUNT_NAME, ACCOUNT_TEL, BALANCE

Monitoring manager:

Manager runs on goldengate home and on default port number 7809 :)
we start manager and stop manager as follows

GGSCI (tnc61.ffdc.com) 3> stop mgr !

Sending STOP request to MANAGER ...

Request processed.

GGSCI (tnc61.ffdc.com) 4> start mgr !

Manager started.

Manager runs on goldengate home on default port number 7809 '

GGSCI (tnc61.ffdc.com) 5> info mgr

Manager is running (IP port tnc61.ffdc.com.7809, Process ID 3905)

GGSCI (tnc61.ffdc.com) 13> info mgr childstatus

Manager is running (IP port tnc61.ffdc.com.7809, Process ID 4035).

4 status of the goldengate process:starting, running, stopped and abended
info all command tells about the status of the process running on the goldengate home

GGSCI (tnc61.ffdc.com) 12> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING ENTNEW 00:00:02 00:00:02 ""

EXTRACT RUNNING EXTORD7 00:00:02 00:00:07 """"

EXTRACT STOPPED PEXTORD7 00:00:00 5257:36:07 """"

EXTRACT STOPPED PXTNEW 00:00:00 5257:36:08

REPLICAT STOPPED REORD117 00:00:00 5257:38:02 '''

When ever any goldengate is started, stopped, abended it creates report file in "dirprm" directory under goldengate home. If any process is abended in goldengate, the root cause can be found in reportfile and ggserror.log

for example : Extract is abended

GGSCI (tnc61.ffdc.com) 12> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

EXTRACT ABENDED ENTNEW 00:00:02 00:46:02

we can view report file from goldengate command line utility as follows

GGSCI (tnc61.ffdc.com) 13> view report ENTNEW

***********************************************************************

Oracle GoldenGate Capture for Oracle

Version 12.1.2.1.0 OGGCORE_12.1.2.1.0_PLATFORMS_140727.2135.1_FBO

Linux, x64, 64bit (optimized), Oracle 12c on Aug 7 2014 10:40:21

Starting at 2015-11-01 05:44:28

***********************************************************************

Operating System Version:

Linux

Version #1 SMP Sat Feb 25 12:50:01 EST 2012, Release 2.6.18-308.el5xen

Node: tnc61.ffdc.com

Machine: x86_64

soft limit hard limit

Address Space Size : unlimited unlimited

Heap Size : unlimited unlimited

File Size : unlimited unlimited

CPU Time : unlimited unlimited

Process id: 4456

Description:

***********************************************************************

** Running with the following parameters **

***********************************************************************

2015-11-01 05:44:28 INFO OGG-03059 Operating system character set identified as UTF-8.

2015-11-01 05:44:28 INFO OGG-02695 ANSI SQL parameter syntax is used for parameter parsing.

extract ENTNEW

SETENV (ORACLE_SID=mis007p)

2015-11-01 05:44:28 INFO OGG-02095 Successfully set environment variable ORACLE_SID=mis007p.

SETENV (ORACLE_HOME="/u01/app/oracle/product/12.1.0/dbhome_1")

2015-11-01 05:44:28 INFO OGG-02095 Successfully set environment variable ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1.

userid ggsuser, password ******

Source Context :

SourceModule : [ggdb.ora.sess]

SourceID : [/scratch/aime1/adestore/views/aime1_adc4150384/oggcore/OpenSys/src/gglib/ggdbora/ocisess.c]

SourceFunction : [oci_try]

SourceLine : [816]

ThreadBacktrace : [17] elements

: [/u01/app/goldengate/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x2b2c93978dae] : [/u01/app/goldengate/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x

340) [0x2b2c93973ac0]] : [/u01/app/goldengate/libgglog.so(_MSG_ERR_ORACLE_OCI_ERROR_WITH_DESC(CSourceContext*, int, char const*

, char const*, CMessageFactory::MessageDisposition)+0x41) [0x2b2c9395c9bf]]

: [/u01/app/goldengate/extract(OCISESS_context_def::oci_try(int, char const*, ...)+0x4e2) [0x6fc8b2]] : [/u01/app/goldengate/extract(OCISESS_context_def::logon(ggs::gglib::ggapp::CLoginName const&, ggs::ggl

ib::ggapp::CDBObjName<(DBObjType)12> const&, ggs::gglib::ggunicode::UString const&, int, bool, int)+0x724) [0x6f9614]] : [/u01/app/goldengate/extract(DBOCI_init_connection_logon(ggs::gglib::ggapp::CLoginName const&, ggs::gg

lib::ggapp::CDBObjName<(DBObjType)12> const&, char const*, int, int, int, char*)+0x83) [0x6be633] : [/u01/app/goldengate/extract [0x6ef4dd]]

: [/u01/app/goldengate/extract [0x6ef6c3]]

: [/u01/app/goldengate/extract(gl_odbc_param(char const*, char const*, char (&) [2048])+0x5a) [0x6ef9fa]

: [/u01/app/goldengate/extract(get_infile_params(time_elt_def*, time_elt_def*, char**)+0x10a5) [0x59a175

: [/u01/app/goldengate/extract(extract_main(int, char**)+0x109) [0x5bfd89]] : [/u01/app/goldengate/extract(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x4f) [0x6ab4df]] : [/u01/app/goldengate/extract(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading:

:Thread::ThreadArgs*)+0x104) [0x6ab9e4]]

: [/u01/app/goldengate/extract(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0x8b) [0x6abdab]

: [/u01/app/goldengate/extract(main+0x3f) [0x5bfa2f]]

: [/lib64/libc.so.6(__libc_start_main+0xf4) [0x307221d994]]

: [/u01/app/goldengate/extract [0x534319]]

2015-11-01 05:44:28 ERROR OGG-00664 OCI Error beginning session (status = 28001-ORA-28001: the password has expired).

2015-11-01 05:44:28 ERROR OGG-01668 PROCESS ABENDING.

from the above report file, we found that goldengate process is abended with ERROR 0GG-00664

ggserror.log exists in goldengate home

from here you can found errors in the alert log file

GGSCI (tnc61.ffdc.com) 14> view ggsevt

Monitoring Extract:

Extract process runs on goldengate home, it works on online redologs and open the trails and flush the data into the local trail directory by using checkpoints. Checkpoint information about the extract can be known by issuing following command

GGSCI (tnc61.ffdc.com) 16> info ENTNEW, showch

EXTRACT ENTNEW Last Started 2015-11-01 05:46 Status RUNNING

Checkpoint Lag 00:00:01 (updated 00:00:01 ago)

Process ID 4521

Log Read Checkpoint Oracle Integrated Redo Logs

2015-11-01 06:30:57

SCN 0.2983087 (2983087)

Current Checkpoint Detail:

Read Checkpoint #1

Oracle Integrated Redo Log

Startup Checkpoint (starting position in the data source):

Timestamp: 2015-03-19 07:42:58.000000

SCN: Not available

Recovery Checkpoint (position of oldest unprocessed transaction in the data source):

Timestamp: 2015-11-01 06:30:57.000000

SCN: 0.2983086 (2983086)

Current Checkpoint (position of last record read in the data source):

Timestamp: 2015-11-01 06:30:57.000000

SCN: 0.2983087 (2983087)

BR Previous Recovery Checkpoint:

Timestamp: 2015-03-26 18:47:36.861436

SCN: Not available

BR Begin Recovery Checkpoint:

Timestamp: 2015-03-27 02:47:52.000000

SCN: 0.2315296 (2315296)

BR End Recovery Checkpoint:

Timestamp: 2015-03-27 02:47:52.000000

SCN: 0.2315296 (2315296)

Write Checkpoint #1

GGS Log Trail

Current Checkpoint (current write position):

Sequence #: 7

RBA: 1470

Timestamp: 2015-11-01 06:30:58.940110

Extract Trail: /u01/app/trails/ba

Trail Type: EXTTRAIL

Header:

Version = 2

Record Source = A

Type = 13

# Input Checkpoints = 1

# Output Checkpoints = 1

File Information:

Block Size = 2048

Max Blocks = 100

Record Length = 2048

Current Offset = 0

Configuration:

Data Source = 3

Transaction Integrity = 1

Task Type = 0

Status:

Start Time = 2015-11-01 05:46:20

Last Update Time = 2015-11-01 06:30:58

Stop Status = A

Last Result = 0

Read checkpoint from online redo logs and write checkpoint on goldengate trail :)

When ever long running transaction is started on database, extracts start and opens the trails in local trail directory. When this transaction is running for longer time, we can watch lag info at column "Time Since Chkpt"
for example:

GGSCI (tnc61.ffdc.com) 12> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING ENTNEW 00:00:02 01:34:02 ""

This is time since extract collected last transaction on database

Monitoring pump:
Pump is configured in the scenario, where we need to broadcast data between instances or low network bandwidth where we replicate data between data centers

example of pump param file

GGSCI (tnc61.ffdc.com) 19> view param PEXTORD7

extract pextord7

PASSTHRU

RMTHOST tnc61.ffdc.com, MGRPORT 7809

RMTTRAIL /u01/app/trails/aa

REPORTCOUNT EVERY 10 MINUTES, RATE

TABLE world.orders;

Functionality of the pump: pump copy local trail from local directory to the remote trail directory. Here pump contacts the manager ( running on target goldengate home) on target server. Manager forward the request to its child process collector. collector allows pump to flush the trail into the remote trail directory

General issues which can obstruct functionality of the pump

--if the manager is down (or) remote trail directory is full, pumps copying data to remote trail server will get abended

--we see data inconsistency issues when pump is unable to perform ETROLLOVER automatically and struck end of the trail

--when ever source server, went down or for server reboot. manager of target server will place lock on the remote trail. placing lock on remote trail make pump unable to flush the data into the same trail with same seqno and pump will get abended

Note 1--

When you perform ETROLLOVER of the pump make sure the RBA is equal to the size of trail

Example:

GGSCI (tnc61.ffdc.com) 20> info PEXTORD7

EXTRACT PEXTORD7 Last Started 2015-03-26 18:47 Status STOPPED

Checkpoint Lag 00:00:00 (updated 5258:54:05 ago)

Log Read Checkpoint File /u01/app/trails/aa000008

2015-03-26 18:47:54.581514 RBA 1473

GGSCI (tnc61.ffdc.com) 24> sh ls -ltr /u01/app/trails/aa*

-rw-r----- 1 oracle dba 1473 Oct 4 03:18 /u01/app/trails/aa000008

-rw-r----- 1 oracle dba 1473 Nov 1 05:46 /u01/app/trails/aa000009

-rw-r----- 1 oracle dba 1473 Nov 1 05:46 /u01/app/trails/aa000010

From above we can confirm that size of the trail is equal to size of trail and we can perform ETROLLOVER

Note2 --

Do you want to check RMTTRAIL Seqno and RBA of the pump? we can issue following command and check which trail seqno and RBA pump is writing to remote trail directory

GGSCI (tnc61.ffdc.com) 27> info rmttrail /u01/app/trails/aa

Extract Trail: /u01/app/trails/aa

Extract: EXTORD7

Seqno: 10

RBA: 1473

File Size: 100M

Extract Trail: /u01/app/trails/aa

Extract: PEXTORD7

Seqno: 4

RBA: 1530

File Size: 100M

From the above command we found, Pump PEXTORD7 is writing to trail "/u01/app/trails/aa" and seqno 4 and RBA 1530

Monitoring Replicat:
Monitoring replicat is very crucial job in the environment. You need guts to challenge to tune query's made by sql on database. Make sure you are good at reading the trail data using logdump utility.
--Note : if the query's made by the sql is taking longer time on database, this query's can consume more cpu and elapsed time and can degrade performance of database

Functionality of the Replicat:
Replicat run on target database on goldengate home and applies data from remote trail directory to target database. Replicat will make consistent sql depend on data it have on trail :)

GGSCI (tnc61.ffdc.com as ggsuser@dbl0p) 36> info REORD117

REPLICAT REORD117 Last Started 2015-11-01 07:33 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:03 ago)

Process ID 6112

Log Read Checkpoint File /u01/app/goldengate/ab000000

First Record RBA 0

GGSCI (tnc61.ffdc.com as ggsuser@dbl0p) 36> info REORD117

REPLICAT REORD117 Last Started 2015-11-01 07:33 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:03 ago)

Process ID 6112

Log Read Checkpoint File /u01/app/goldengate/ab000000

First Record RBA 1420

If you can watch, Watch RBA is increasing, indicates replicat is moving, not only for replicat, you can watch same behavior for the pump

GGSCI (tnc61.ffdc.com) 29> info all

Program Status Group Lag at Chkpt Time Since Chkpt

REPLICAT RUNNING REORD117 00:00:00 00:10:31

if you look , Column "Time since Chkpt" is increasing

Replicat applies transaction on database and updates check point table
If the replicat is taking longer time to apply the sql's on database it stops applying to check point table. the time "Time Since Chkpt" column will get increased. This behavior indicates, there is long running transaction on database or some statement running on database are blocking DML made by replicat (or) there is problem with sql made by the replicat on target database.

---Nikhil Tatineni---

---Oracle in memory---

Database In Memory

Sunday, November 1, 2015

Oracle GoldenGate: General Monitoring

Querys to monitor RAC

Followers