Written By

Introduction

We have seen series of Global data services, now this article is purely for Data Guard friendly global data services or we can say replication friendly services. We are aware of the service failover to the available instances upon GDS configuration when the preferred instance is not available but now this explains when to relocate service to the available instances after some specific lag between Primary and standby databases (or) Golden Gate replicated databases.

Replication Lag Based Routing

The reporting jobs/queries we are not offloading to the standby database with the Active Data Guard feature (or) we route the services to bi-replication environments if the data is same (exact) to the production/primary database. As we said above the service failover practice is simple and straight forward instructions to run on available instances if the preferred instance not available and when the attribute “-FAILOVER_PRIMARY” configured, But this will not give the solution to relocate service based on the LAG on standby/bi-replicated with the primary database.

In the real-world example, if the standby with Active Data Guard up and running with the reporting queries/jobs, but how can you guarantee whether the standby does have very recent data? Well there is answer by using “STANDBY_MAX_DATA_DELAY” but this will trigger when the query runs and then if the exception occurs then the session will be terminated and will not failover service to primary. This is the challenging with the traditional services.

Now with the Global data services, it is simplified with single attribute in the services. If the service is already created without the attribute then we can still modify service. i.e. “-lag” which provides the Lag tolerance based routing.  In this demo we are using Data Guard with primary and standby database and GDS fully configured and operational.

GDSCTL>databases
Database: "canada" Registered: Y State: Ok ONS: N. Role: PH_STNDBY Instances: 1 Region: westcan
Service: "cobol_process" Globally started: Y Started: N
Scan: N Enabled: Y Preferred: Y
Service: "nvision_report" Globally started: Y Started: Y
Scan: N Enabled: Y Preferred: Y
Registered instances:
psfin%1
Database: "india" Registered: Y State: Ok ONS: N. Role: PRIMARY Instances: 1 Region: apac
Service: "cobol_process" Globally started: Y Started: Y
Scan: N Enabled: Y Preferred: Y
Service: "nvision_report" Globally started: Y Started: N
Scan: N Enabled: Y Preferred: Y
Registered instances:
psfin%11

GDSCTL>services
Service "cobol_process.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%11", name: "ORC1", db: "INDIA", region: "apac", status: ready.
Service "nvision_report.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.

GDSCTL>

DGMGRL> show configuration

Configuration - hadg

Protection Mode: MaxPerformance
Members:
india - Primary database
canada - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS (status updated 39 seconds ago)

DGMGRL>

If we see above two databases are registered with GDS and two services 1) COBOL_PROCESS  for Primary 2) NVISION_REPORT for Standby are running and the Data Guard configuration is in success status with fully functional.  

Lag based routing service

Now we will create new service for standby role support but if in case of any LAG then after the lag time the service will be failover to primary database and all the connections will be routed to primary even though you connect to the service name, regarding the service failover we have already seen but specific to lag based routing we are going to see how it works.

GDSCTL>add service -service app_engine_lag10 -gdspool psfin -preferred_all -role PHYSICAL_STANDBY -lag 10 -failover_primary
Catalog connection is established
GDSCTL>services
Service "cobol_process.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%11", name: "ORC1", db: "INDIA", region: "apac", status: ready.
Service "nvision_report.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.

 We have added service under the GDS pool PSFIN and the service added successfully, now we will start the service.

GDSCTL>start service -service app_engine_lag10 -gdspool psfin

GDSCTL>services
Service "app_engine_lag10.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.
Service "cobol_process.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%11", name: "ORC1", db: "INDIA", region: "apac", status: ready.
Service "nvision_report.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.

GDSCTL>

After starting services, you can see that service is started from physical standby database Canada. We can also describe the service properties to know how the service is configured like below

GDSCTL>config service -service app_engine_lag10 -gdspool psfin
Name: app_engine_lag10
Network name: app_engine_lag10.psfin.oradbcloud
Pool: psfin
Started: Yes
Preferred all: Yes
Locality: ANYWHERE
Region Failover: No
Role: PHYSICAL_STANDBY
Primary Failover: Yes
Lag: 10
Runtime Balance: SERVICE_TIME
Connection Balance: LONG
Notification: Yes
TAF Policy: NONE
Policy: AUTOMATIC
DTP: No
Failover Method: NONE
Failover Type: NONE
Failover Retries:
Failover Delay:
Edition:
PDB:
Commit Outcome:
Retention Timeout:
Replay Initiation Timeout:
Session State Consistency:
SQL Translation Profile:

Databases
------------------------
Database Preferred Status
-------- --------- ------
canada Yes Enabled
india Yes Enabled

GDSCTL>

After the service adding to GDS, we can optionally add the TNS entries into tnsnames.ora into guest homes, like below.

APP_ENGINE =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = ORA-C2.localdomain)(PORT = 1555))
)
(CONNECT_DATA =
(SERVICE_NAME = app_engine_lag10.psfin.oradbcloud)
)
)

[oracle@ORA-C2 admin]$ sqlplus sys/oracle@app_engine as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jun 15 05:08:51 2016

Copyright (c) 1982, 2014, Oracle. All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PHYSICAL STANDBY

SQL>

[oracle@ORA-C2 admin]$ lsnrctl status |grep app_engine
Service "app_engine_lag10.psfin.oradbcloud" has 1 instance(s).
[oracle@ORA-C2 admin]$

 The service APP_ENGINE now connected to the standby database as configured.

Data Guard configuration review

As said above, the Data Guard configuration contains one primary database “INDIA” and one standby database “CANADA” with Data Guard broker configured in Maximum Performance Mode.

DGMGRL> show database india

Database - india

Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
ORC1

Database Status:
SUCCESS

DGMGRL> show database canada

Database - canada

Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: 0 Byte/s
Real Time Query: ON
Instance(s):
ORC1

Database Status:
SUCCESS

DGMGRL> 

Now we will enforce to have 10 minutes of LAG between primary and standby databases, so that the service should be connecting to primary database upon the LAG time threshold.     

DGMGRL> edit database canada set state='APPLY-OFF';
Succeeded.
DGMGRL> show database canada

Database - canada

Role: PHYSICAL STANDBY
Intended State: APPLY-OFF
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: (unknown)
Real Time Query: OFF
Instance(s):
ORC1

Database Status:
SUCCESS

DGMGRL>

We have stopped the media recovery process to enlarge the lag between primary and standby databases. For timestamp we can see below entries from the standby alert log.

Wed Jun 15 04:48:59 2016
Managed Standby Recovery Canceled (ORC1)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL


After the MRP canceled, we can below latest services status and the services running as per the predefined targets.

GDSCTL>services
Service "app_engine_lag10.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.
Service "cobol_process.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%11", name: "ORC1", db: "INDIA", region: "apac", status: ready.
Service "nvision_report.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.

GDSCTL>

 

We’ve set the LAG for 10 minutes, hence we have to wait for 10 minutes and then look at the behavior of the service and whether it is going to still run on standby or primary?

DGMGRL> show database canada

Database - canada

Role: PHYSICAL STANDBY
Intended State: APPLY-OFF
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 10 minutes 50 seconds (computed 1 second ago)
Average Apply Rate: (unknown)
Real Time Query: OFF
Instance(s):
ORC1

Database Status:
SUCCESS

DGMGRL>

 

The apply lag threshold is reached 10 minutes and now we will see where the service is currently running.

[oracle@ORA-C2 ~]$ sqlplus sys/oracle@app_engine as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Jun 15 05:13:03 2016

Copyright (c) 1982, 2014, Oracle. All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select db_unique_name,database_role from v$database;

DB_UNIQUE_NAME DATABASE_ROLE
------------------------------ ----------------
INDIA PRIMARY

SQL>

In fact the created service should be running from standby database, but now it is currently running on the primary database because of the LAG limit is not meeting the requirements and hence as per the failover configuration policy the service is currently running from the primary database.

SQL> select name,network_name,global,max_lag_time from v$active_services where name='app_engine_lag10';

NAME NETWORK_NAME GLO MAX_LAG_TIME
------------------- ---------------------------------------- --- ------------------------------
app_engine_lag10 app_engine_lag10.psfin.oradbcloud YES 10

 

In more detail, we have checked the view v$active_services and the service is global type and the maximum lag time is 10 minutes.

DGMGRL> edit database canada set state='APPLY-ON';
Succeeded.
DGMGRL> show database canada

Database - canada

Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: 122.00 KByte/s
Real Time Query: ON
Instance(s):
ORC1

Database Status:
SUCCESS

DGMGRL>

From the above test we learned how the service is relocated automatically after the specific lag time reached on standby database, now we have enabled log apply on standby database and we will see the behavior of the app_engine service whether it is going to run on primary or standby database.

SQL> select db_unique_name,database_role from v$database
2 ;

DB_UNIQUE_NAME DATABASE_ROLE
------------------------------ ----------------
CANADA PHYSICAL STANDBY

GDSCTL>services
Service "app_engine_lag10.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.
Service "cobol_process.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%11", name: "ORC1", db: "INDIA", region: "apac", status: ready.
Service "nvision_report.psfin.oradbcloud" has 1 instance(s). Affinity: ANYWHERE
Instance "psfin%1", name: "ORC1", db: "CANADA", region: "westcan", status: ready.

GDSCTL>

 

As soon as the standby log apply is on and within the 10 minutes lag duration, the service now again running from the desired database i.e. standby database.

Conclusion

We’ve learned what is the replication lag based routing, how it helps in replicated environment such as Data Guard or Golden gate, Apart from that we have created new service for standby with lag tolerance of 10 minutes for the reporting purpose and then we have stimulated by stopping log apply on standby database to check how and where the new service APP_ENGINE is going to function, like wise after starting the log apply on standby and as soon as the lag is reduced finally the service again running from the standby database. In huge replicated environments such operations no need to do by manually or no monitoring required everything GDS can do.