Friday, October 30, 2015

Oracle GoldenGate: How to find out Transaction INFO?


 we want to know what kind of transaction  extract is working on database? which program it is calling ?

Step1:
-------
Find out the extract info and get XID using below command from ggsci 

GGSCI> send extract extractname, showtrans duration 30 MIN
Sending SHOWTRANS request to EXTRACT ENTNEW ...
Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 6, RBA 2239791
-----------------------------------------------------------
XID: 1.8.209025
Items: 0
Extract: ENTNEW
Redo Thread: 1
Start Time: 2015-26-22:00:10:06

SCN: 0.2239791 (2239791)
Redo Seq: 6
Redo RBA: 1470

Status: Running


Step2: 
-------
using  XID  find out ADDR from the v$transaction on database 
select ADDR  from v$transaction where xidusn='XID';


STEP3: 
--------
using ADDR find out the SID from database;

select sid,serial#,event,machine,sql_id,seconds_in_wait,prev_sql_id,module,program,action from v$session where taddr='ADDR';


STEP4: 
---------
using SQL_ID from step 3 find out the sql which is taking longtime on gg env 

select
hash_value, address,
executions,buffer_gets, disk_reads,
round(buffer_gets/decode(executions, 0, 1, executions), 1) avg_gets,
round(disk_reads/decode(executions, 0, 1, executions), 1) avg_disk,
last_load_time,
module,
sql_fulltext
from v$sqlarea
where sql_id='&sql_id';

From step4 we will SQL text and SQL_ID for long running transaction :) 
depend on skip or wait for transaction to complete on database 

GGSCI > send extract ENTNEW, skiptrans 1.8.209025



------Oracle in memory ---
------Nikhil Tatineni ---

Goldengate Replicat is running very slow


Hi Friends :) Hope you are good ! Today I want to discuss about performance of replicat running on Oracle goldengate home

Scenario: Replicat is running very slow 

Initial Investigation:
Replicat works on target database, where it applies the records from remote trail directory to target table. when it reads records from remote trail directory, using the information on the trail, it creates the consistent sql to apply to the target table. If the sql made by the replicat is not using Indexes or optimizer is generating different execution plans by excluding index's or may be blocking sessions or wait events running on database. These are the different things where we need to look into

There is no blocking sessions 
Immediately took AWR report and found DB TIME is quiet higher than usual and checked for  " Top foreground events by total wait time " found db file sequential read. Hint, may be replicat is using bad index. 
later on checked for SQL's ordered by elapsed time, then found my replicat on top of it 
I got SQL ID and tried to run SQL tuning advisor on it.  Results as follows

-----------
Plan hash value: 748041570

---------------------------------------------------------------------------------------------
| Id  | Operation          | Name| Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |  |     1 |    39 |   184K  (2)| 00:36:49 |
|   1 |  UPDATE            |  |       |       |            |          |
|*  2 |   TABLE ACCESS FULL|  |     1 |    39 |   184K  (2)| 00:36:49 |
---------------------------------------------------------------------------------------------

Root cause: I found replicat is performing full table-scan to update the record and creating lag in the environment.

Potential Fix: created index on predicate columns as suggestions from SQL tuning advisor improved performance of replicat by 100% 


-----when we add supplemental logging, primary keys will be logged into the online redo, here we have to understand keys should be the same for both source table and target table. If the keys are different where we can see these kind of issues in environment -----

---Happy weekend ---
--Oracle in memory --
--Nikhil Tatineni ---

Goldengate Replicat's blocking itself

Scenario: Multiple replicats working on same table and blocking itself 
Replicats are not moving, creating 14 hours lag. Data is out of sync between instances 


Root-cause
we are replicating very large volume tables and where we split replicat 1 into 6  using range function to avoid lag on replicat. so here, all 6 replicats working on same table. i.e 6 replicats will perform concurrent operation on the block. current operations on the block is controlled by storage parameter on database INIT Trans. on our database it is only 1. It is not allowing replicats to perform concurrent operations on the block resulting in 14 - 17 hour lag. 

Potential fix: 
Finally we increased storage parameter INIT trans parameter for the table from 1 to 10 
where replicat can perform concurrent operations on the table, which resolved this issue 

I strongly recommend, when we are splitting replicats to handle or replicat large volume data, I recommend to look into storage level parameter " INIT Trans " for the table. If the init trans parameter is n on the table. I recommend to split  single replicat into n+1 


---Nikhil Tatineni
--Oracle in memory 

GoldenGate Extract : Bounded Recovery


Scenario: Placed lag monitoring scripts on goldengate servers. One of our extract is running with 17 hour lag ! We are engaged  :) 

Initial investigation on database and checked for long running transactions and blocking sessions on database. Database is clear  
Then started looking at goldengate components
Checked for report file generated by the extract found process information as follows 

2015-10-05 07:59:03  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p8606_Redo Thread 1: start=SeqNo: 118841, RBA: 25235984, SCN: 3326.2439638626 (14287500865122), Timestamp: 2015-10-05 07:59:02.000000, end=SeqNo: 118841, RBA: 25249280, SCN: 3326.2439638675 (14287500865171), Timestamp: 2015-10-05 07:59:02.000000, Thread: 1.
2015-10-05 07:59:03  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 2: p8606_Redo Thread 2: start=SeqNo: 117802, RBA: 84960272, SCN: 3326.2439638649 (14287500865145), Timestamp: 2015-10-05 07:59:02.000000, Thread: 2, end=SeqNo: 117802, RBA: 84960768, SCN: 3326.2439638649 (14287500865145), Timestamp: 2015-10-05 07:59:02.000000, Thread: 2.
2015-10-05 07:59:03  INFO    OGG-01738  BOUNDED RECOVERY: CHECKPOINT: for object pool 3: p8606_Redo Thread 3: start=SeqNo: 118013, RBA: 16379408, SCN: 3326.2439638623 (14287500865119), Timestamp: 2015-10-05 07:59:01.000000, Thread: 3, end=SeqNo: 118013, RBA: 16379904, SCN: 3326.2439638623 (14287500865119), Timestamp: 2015-10-05 07:59:01.000000, Thread: 3.

Bounded Recovery happens when there are long running transaction or batch jobs that runs beyond the default bounded recovery time (4 hours)  
You know that extract collects only committed transactions from database but in the case of batch jobs, they run for long time on database and  have both committed and uncommitted transactions 
If the extract is waiting for commit for longer time and mean while uncommitted/committed data is flushed into the archive logs / flash recovery area on database 
At this point if the archive logs are deleted by RMAN, the transactions performed by the batch jobs will be lost 
To avoid these problems, Oracle / GG team developed the methodology to save those transactions to the disk / goldengate home directory BR 
It will save all long running transaction / uncommitted records info in BR directory under goldengate home, where extract reads records beyond whenever commit is performed 

Read checkpoints on database 
The current checkpoint and the recovery checkpoint are read checkpoints which keeps the redo log positions related to the extract process:
current checkpoint " up to here all committed transactions are written to trail by extract 
&
recovery checkpoint " is the position when the   old or long or batch processing transaction starts 
write checkpoints in the trail
When the extract stops, the cached data will be lost. When GoldenGate extract restarts, it has to go back to the recovery checkpoint to read the redo log files, discard all committed transactions before the current checkpoint, and recover all of the cached data for open 

Root Cause: 
I believe, in this process extract is struck in finding a record during reading records from BR directory. Restarted extract,It performed Bounded  recovery, lag is cleared 

GGSCI (tnc569) 3> info all
Program     Status      Group                Lag at Chkpt            Time since Chkpt
MANAGER     RUNNING
EXTRACT     RUNNING     EXTGRP     00:00:00      00:00:03


Oracle in memory --- :) 
Nikhil Tatineni --- 



Thursday, October 29, 2015

Datapump scenario's

Few scenarios on datapump 

Scenario:We are trying to exporting database / schema / table and your data pump directory don't have enough space to hold the dump files. If some one ask you to know the size of the dumpfile before exporting ? how we are going to know about  the size of the dumpfile before export ? we use following syntax 

Syntax: expdp schema/password estimate_only=y directory=directoryname schemas=schemaname 

In the above syntax, i want to know about the size of the dumpfile created when i export whole schema, you can check for database, schema or table 

Scenario: we are trying to export database and recommended to perform consistent backup to avoid  data consistency errors ( ORA-1555 ) 

Syntax: expdp schema/password directory=directoryname dumpfile=dumpfile.dmp logfile=logfile.log schemas=schemaname flashback_scn=scnnumberfromdatabase 

Scenario: Application DBA is running export job on database. we received few alerts stating that database cpu went high and asked us to know the status on export - datapump job on database ? what you are going to do in this scenario ?

Here we have to understand, there is any data dictionary view which tells about the datapump operations? yes, of-course
we are going to get information from view " dba_datapump_jobs " AND user_datapump_jobs 

This View tells about all active datapump jobs on database 

Syntax: expdp schema/password directory=directoryname dumpfile=dumpfile.dmp logfile=logfile.log schemas=world flashback_scn=1234567 job_name=export-schema 

we can identify different datapump jobs using job_name 

We can monitor datapump jobs using v$session_longops and check progress of datapump jobs
select sid,serial#,sofar,totalwork  from v$session_longops;

select x.job_name,b.state,b.job_mode,b.degree
, x.owner_name,z.sql_text, p.message
, p.totalwork, p.sofar
, round((p.sofar/p.totalwork)*100,2) done
, p.time_remaining
from dba_datapump_jobs b
left join dba_datapump_sessions x on (x.job_name = b.job_name)
left join v$session y on (y.saddr = x.saddr)
left join v$sql z on (y.sql_id = z.sql_id)
left join v$session_longops p ON (p.sql_id = y.sql_id)
WHERE y.module='Data Pump Worker'
AND p.time_remaining > 0;

select    sid,    serial#
from    v$session s,    dba_datapump_sessions d
where     s.saddr = d.saddr;


Scenario:we are exporting schema and few tables on world schema and we want to improve performance of export jobs?  This is challenge we have on board 

To overcome this scenario we used parallel option to improve the performance of datapump jobs 

syntax: expdp system/password@tns directory=diretoryname dumpfile=dumpfile.dmp logfile=logfile.log schemas=world parallel=4 

parallel must be integer depend up on number of cores available on the server
In most of the scenarios this will improve performance of datapump jobs.  But we faced another scenario in our environment  Even we used parallel option, export of the table is very slow . we gathered stats on table and increased performance of the datapump job :)  :) can some one explain how stats and datapump job are co-related ? 


Scenario: when we are exporting schema depend on the requirement, we export whole schema or we export only DDL or DML of the schema. Many of us will get confused what is DDL backup and DML backup using datapump. we use following parameters to export only DDL and DML on database 

content=all  | All loads all the metadata as well as data from the source dump file.
content=metadata_only | METADATA_ONLY only creates database objects, no data is inserted. 
content=data_only | DATA_ONLY only loads row data into the tables no database objects are created.

For example to export only DDL of the world schema : syntax as follows 

expdp world/world directory=directoryname dumpfile=dumpfile.dmp logfile=logfile.log schemas=world content=metadata_only 

----Nikhil Tatineni--

Oracle In Memory --

Tuesday, August 18, 2015

ERROR OGG-00446 Missing filename opening checkpoint file

Scenario: 
Added New Replicat Process, Replicat Abended with following Error

RootCauseAnalysis:
ggsci> view report repri

Replicat Abends with -----
S_LANGUAGE     = "AMERICAN"
NLS_TERRITORY    = "AMERICA"
NLS_CHARACTERSET = "US7ASCII"
Source Context :
  SourceModule            : [er.checkpoint]
  SourceID                : [/scratch/aime1/adestore/views/aime1_staxj04/oggcore/OpenSys/src/app/er/checkpoint.cpp]
  SourceFunction          : [get_checkpoint_context(time_elt_def *, time_elt_def *)]
  SourceLine              : [45]
2015-07-29 08:50:43  ERROR   OGG-00446  Missing filename opening checkpoint file.
2015-07-29 08:50:43  ERROR   OGG-01668  PROCESS ABENDING.


Resolution:
Make sure checkpoint table is created is created in GoldenGate schema

Steps to create checkpoint table
Step1:
GGSCI (nt4567) 25> dblogin userid GGADMIN, PASSWORD XXXXXXXXX
Successfully logged into database.


Step2:
GGSCI (nt4567) 26> add checkpointtable GGADMIN.ckptab
Successfully created checkpoint table GGADMIN.ckptab

Step3: 
After Creating checkpointtable delete abended replicat and re-create Replicat using checkpointtable :)

--Nikhil Tatineni---


ERROR OGG-00665 OCI Error calling OCITransRollback (stat us = 3114-ORA-03114

Scenario: 
Replicat abended with following error  "ERROR   OGG-00665"

Root cause:
To find out root cause,  we checked report file, found following error

GGSCI (pida4909) 2> view report RIVIQ1
Source Context :
  SourceModule            : [ggdb.ora.sess]
  SourceID                : [/scratch/aime1/adestore/views/aime1_staxj04/oggcore
/OpenSys/src/gglib/ggdbora/ocisess.c]
  SourceFunction          : [OCISESS_context_def::oci_try(int, const char *, ..]
  SourceLine              : [651]

2015-07-29 10:21:52  ERROR   OGG-00665  OCI Error calling OCITransRollback (stat
us = 3114-ORA-03114: not connected to ORACLE), SQL<Not available>.
***********************************************************************
*                   ** Run Time Statistics **                         *
***********************************************************************

After investigation, we found following results
Investigate why this Replicat is unable to connect to database  ?

Resolution:
1) check database is up & running
2) check Listener is up and running
3) Check with SA | server reboot schedule 

-- Oracle In-memory >>
-- Nikhil Tatineni >>

Querys to monitor RAC

following few  Query's will help to find out culprits-  Query to check long running transaction from last 8 hours  Col Sid Fo...