Saturday, January 31, 2015

DBPITR :RMAN



I want to discuss Database Point InTime Recovery day !


As a DBA, 1st thumb rule, we need to have backup for database to avoid logical corruption, datafile recovery, block recovery, tablespace point in-time recovery and for disaster recovery so on..


we know,  database will log errors in alert.log file in " background_dump_dest " location,  we can find out SCN when database is crashed from alert.log file.  


Scenario:


Let's assume that some batch jobs running on database inserted wrong data into the tables, corrupted whole database, now we have to perform Database Point Intime Recovery as follows,


Step: 1 we can find out information about jobs, at what time jobs started on database ?,  get this information from application team or query "DBA_SCHEDULER_JOBS" view on database


let's assume that, jobs started on database at "31-01-2015 11:15:00" we have to perform, database point time recovery using this timestamp, I want to match current timestamp with SCN of database, we can use both SCN and TIMESTAMP to restore and recover the database.


STEP1:

SQL> select timestamp_to_scn(to_timestamp(’31-01-2015 11:15:00’,’DD/MM/YYYY HH24:MI:SS')) as scn from dual; SCN ---------- 912264

In STEP:2 I am bringing down database and starting the database in mount stage

SQL> shut immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount; ORACLE instance started. Total System Global Area 841162752 bytes Fixed Size 1348188 bytes Variable Size 499125668 bytes Database Buffers 335544320 bytes Redo Buffers 5144576 bytes Database mounted.

In STEP3: I am connecting to RMAN, and restoring database to point using SCN or Timestamp in step:1

 oracle@sfo ~]$ . oraenv

ORACLE_SID = [r456] ?     
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@sfo ~]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Sat Jan 31 13:03:41 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: R456 (DBID=707274850, not open)

RMAN>  RUN
{ SET UNTIL '31-01-2015 11:15:00'; restore database; RECOVER DATABASE; }


--Output


RMAN> Starting restore at 31-JAN-15 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=20 device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=21 device type=DISK channel ORA_DISK_1: starting datafile backup set restore channel ORA_DISK_1: specifying datafile(s) to restore from backup set channel ORA_DISK_1: restoring datafile 00002 to /u01/app/oracle/oradata/r456/sysaux01.dbf channel ORA_DISK_1: restoring datafile 00005 to /u01/app/oracle/oradata/r456/GGS1.dbf channel ORA_DISK_1: restoring datafile 00007 to /u01/app/oracle/oradata/r456/NSPDB01.dbf channel ORA_DISK_1: restoring datafile 00008 to /u01/app/oracle/oradata/r456/GG_USER01.dbf channel ORA_DISK_1: restoring datafile 00009 to /u01/app/oracle/oradata/r456/GG01.dbf channel ORA_DISK_1: reading from backup piece /u01/RMANBACKUP/08pu3lje_1_1 channel ORA_DISK_2: starting datafile backup set restore channel ORA_DISK_2: specifying datafile(s) to restore from backup set channel ORA_DISK_2: restoring datafile 00001 to /u01/app/oracle/oradata/r456/system01.dbf channel ORA_DISK_2: restoring datafile 00003 to /u01/app/oracle/oradata/r456/undotbs01.dbf channel ORA_DISK_2: restoring datafile 00004 to /u01/app/oracle/oradata/r456/users01.dbf channel ORA_DISK_2: restoring datafile 00006 to /u01/app/oracle/oradata/r456/OSPDB01.dbf channel ORA_DISK_2: restoring datafile 00010 to /u01/app/oracle/oradata/r456/masood01.dbf channel ORA_DISK_2: reading from backup piece /u01/RMANBACKUP/09pu3lje_1_1 channel ORA_DISK_1: piece handle=/u01/RMANBACKUP/08pu3lje_1_1 tag=TAG20150131T122829 channel ORA_DISK_1: restored backup piece 1 channel ORA_DISK_1: restore complete, elapsed time: 00:01:46 channel ORA_DISK_2: piece handle=/u01/RMANBACKUP/09pu3lje_1_1 tag=TAG20150131T122829 channel ORA_DISK_2: restored backup piece 1 channel ORA_DISK_2: restore complete, elapsed time: 00:01:46 Finished restore at 31-JAN-15


Starting recover at 31-JAN-15 using channel ORA_DISK_1 using channel ORA_DISK_2 starting media recovery media recovery complete, elapsed time: 00:00:01 Finished recover at 31-JAN-15


RMAN successfully restored and recovered the database 

job is completed


STEP:4: Open database and validate data in database


[oracle@sfo ~]$ . oraenv ORACLE_SID = [r456] ? The Oracle base remains unchanged with value /u01/app/oracle [oracle@sfo ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.3.0 Production on Sat Jan 31 13:15:44 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select open_mode from v$database; OPEN_MODE -------------------- MOUNTED SQL> SQL> alter database open; Database altered. SQL> SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination USE_DB_RECOVERY_FILE_DEST Oldest online log sequence 5 Next log sequence to archive 7 Current log sequence 7




SQL> select open_mode, database_name from v$database;

OPEN_MODE, DATABASE_NAME
--------------------------------------------------------------------------------
READ WRITE, R456

---Theory

DBPITR within the current incarnation is performed using the current control file. When performing DBPITR, you can avoid errors by using the SET UNTIL command to set the target time at the beginning of the process, rather than specifying the UNTIL clause on the RESTORE and RECOVER commands individually. This ensures that the datafiles restored from backup will have timestamps early enough to be used in the subsequent RECOVER operation.


The following example performs DBPITR on the target database until SCN 912264:

RUN { SET UNTIL 912264;
restore database; RECOVER DATABASE; } Note: You can also use time expressions, restore points, or log sequence numbers to specify the SET UNTIL time: SET UNTIL TIME 'Nov 15 2004 09:00:00'; SET UNTIL SEQUENCE 9923; SET UNTIL RESTORE POINT before_update; # Alternatives: # SET UNTIL TIME 'Nov 15 2004 09:00:00'; # SET UNTIL SEQUENCE 9923; RESTORE DATABASE;


----------

Conclusion:

DBPITR is a technique to restore database to point in time, if the size of the database is large 8TB, we cannot perform "DBPITR", if your database is logically corrupted, DBTIR takes more time to restore and recover database, need application downtime. As alternative we can perform FLASHBACK USING SCN number !!


Thank you !! Oracle in memory !!:)

Knowledge on goldengate

Knowledge on Oracle Goldengate



components of goldengate:
Manager,
Collector,
Extract,
Local trails,
Pump,
Replicat,
Remote trails,
checkpointtable,

Manager & Collector: Manager runs on both Source and Target oracle goldengate home with default port number 7809, we introduce different parameters in manager process, example manager purging rules, auto restart parameters, reporting rules etc. In goldengate, DBA interact with  oracle goldengate manager  process, we send stop and start ogg process request to manager, manager completes the request. when pump tries to copy trail from source to  target server, manager calls collector to receive it, collector allows the pump to flush the trail in to remote trail directory.
we can Implement purging rules in manager process to purge old trails in Trail Directory 
purgeoldextracts  /local_trail/*, usecheckpoints, minkeepdays 2,  frequencyminutes 60
we can Implement Autorestart parameters in manager process, if any process abends it will automatically restart the abended process 
autorestart extract e*, retries 10, waitminutes 2, resetminutes 1440
create warning in ggserror.log for critical errors 
lagcriticalseconds 30


Extract processes:
Extract works on source database, it collects transactions against source database, write data to local trails, as we know that SCN number of database is equal to commit sequence number CSN of goldengate. we can configure extract at table level & schema level. there are different type of extracts, classic and Integrated extracts. Classic extract works on online redo logs / archive logs to collect transactions against source tables where Integrated extract work's with log mining server to extract data against source tables. we can introduce different type of parameters in EXTRACT parameter file depend on type of  replication configuration ( DML/DDL).

to be continued .....


REPLICAT WITH HANDLECOLLISIONS

welcome back ...

Scenario:

Today I found a strange behaviour of REPLICAT in Goldengate environment, We migrated database from AIX 6.1 to RHEL 6.x. Pumps are exactly working fine on AIX servers which are writing data to remote trail directory on RHEL servers , replicats are running fine on RHEL OGG HOME today morning. They are abended with generating alot of discards. To find out RCA, checked trails using logdump utlity trail is perfectly fine. Renamed discard file and started replicat's again, again they are abended with same error by generating alot of discards. Later, I opened SERVICE REQUEST with oracle, Oracle requested me to add handlecollisions to replicat parameter file which can avoid this replicat from abending and remove after replicat processing all the trails in target server. After adding HANDLECOLLISIONS parameter, replicat didn't generated any discards. it is perfectly working fine !!!! STRANGE REPLICAT !!!!!

#####################

BEST SCENARIO to use HANDLECOLLISIONS 
a) During initial load, it will convert missing update into insert
b) When your primary table and target tables are not in sync, it prevents replicat from abending !
c) Replicat’s ABENDED after target server crash; perform following tasks 

  1. Issue alter <pump name> etrollover on source host
  2. Start pump
  3. Verify if replicats all abended on the same record
  4. And if so, verify in the DB that the record existed
  5. Update PRM files for Replicats and add HANDLECOLLISIONS
  6. Start replicats
  7. After replicats start and seem to be running successfully, remove HANDLECOLLISIONS
Hardly, I never recommend you to use handle-collisions in replicat, i have seen replicat is inserting duplicate records in to the table !!!

ERROR 0GG-01031

PROBLEM:  Pump abended with ERROR OGG-01031

GGSCI > view report POGG9

 ERROR   OGG-01031  There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/opt/app/ggs/trails/ap000029" (error 13, Permission denied)).

ERROR   OGG-01668  PROCESS ABENDING.

Cause: Stale  replicat process on target is holding the lock on the remote file. 

Resolution: 
  • Go to the target server
  • Stop all the REPLICATS that are running on target server.
  • Start abended pump
  • start all replicat's, Recreate the replicats if they don’t  start  or abended without any error, using the process


Querys to monitor RAC

following few  Query's will help to find out culprits-  Query to check long running transaction from last 8 hours  Col Sid Fo...