Introduction

This article is focused on how to monitor and troubleshoot the Data Guard when managing with Broker. There are many ways to monitor Data Guard such as using Generic/Traditional views, Using Grid control/Cloud control and on top of that Data Guard can be monitored using Data Guard Broker utility DGMGRL. This article is completely based on Data Guard broker and how to deal with it and it covers even 12c new features of Data Guard .

Data Guard Broker

To create or manage Data Guard(physical or logical or snapshot standby) the broker is not an mandatory to use, but it gives lot of comfort in maintenance and administration such as switchover or failover with single line commands instead of executing multiple commands from the both primary and standby sites.

Data Guard broker has its own background process called DMON on each site of Primary and Standby database(s). This process will start working with the influence of DG_BROKER_START to TRUE. If you want to start managing Data Guard with broker you have to create configuration by mentioning what are the members of the Data Guard configuration. This configuration is maintained in the configuration files which are available in each site of the default location of "DG_BROKER_CONFIG_FILE_n", these configuration files will be having all the information of Data Guard configuration. In order to investigate any issues around Data Guard broker, it has its own log files we can find them in same diag destination where the alert log is located with the file name drc$ORACLE_SID.log and you can find this process in many posts/blogs , I would like to provide brief information on this as am going to deal with Data Guard broker throughout the article.

 It is very highly recommended to manage Data Guard with Data Guard broker if enabled in order to avoid misconception in configuration and parameters.

 Monitoring and Troubleshooting

 I would like to start with a question before discussing about on how to monitor and toubleshooting Data Guard using Broker.

Question) If customer asked you to verify the configuration of the Data Guard when using generic method (SQL Plus) , What are the things you have to review?

Answer) When it comes to Data Guard, for every DBA the very first thing is to check is there any LAG on Standby database with Primary , For that we have to use several views and they need to run on primary and standby databases in order to justify that my Data Guard configuration is working fine. But when it comes to Data Guard Broker the life becomes easy, Data Guard can be managed with a single line command, i.e.

DGMGRL> show configuration;

Configuration - ckpt12c

Protection Mode: MaxPerformance

Databases:

canada - Primary database

   canfar - Far Sync

     india - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:

SUCCESS

DGMGRL>

So the configuration is pretty good and no issues found from either Primary or Standby or Far Sync Instance(12c) and you can judge simply by checking the status whether configuration status is in SUCCESS or ERROR. if i want to check the LAG between Primary and Standby no need to review the views any more, I can get all the information with a single command again.

DGMGRL> show database india;

Database - india

Role:             PHYSICAL STANDBY

Intended State:   APPLY-ON

Transport Lag:     0 seconds (computed 0 seconds ago)

Apply Lag:         0 seconds (computed 0 seconds ago)

Apply Rate:       8.81 MByte/s

Real Time Query:   ON

Instance(s):

   drmcdb

Database Status:

SUCCESS

DGMGRL>

The output illustrates the database role, whether the recovery is in progress or not, LAG rate and whether Standby is running in Real Time Query or not so on. So with Data Guard broker so handy, no need to use the traditional method in order to justify quickly.

Instance Status

If we want check the status of instance instance, we have to use the ORACLE_SID with the commmand and hence we can review brief status with the few configuration settings.

 DGMGRL> show instance verbose drmcdb;

 Instance 'drmcdb' of database 'india'

 Host Name: CKPT-ORA-04

PFILE:

Properties:

   StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=CKPT-ORA-04(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=INDIA_DGMGRL)(INSTANCE_NAME=drmcdb)(SERVER=DEDICATED)))'

   StandbyArchiveLocation         = 'USE_DB_RECOVERY_FILE_DEST'

   AlternateLocation               = ''

   LogArchiveTrace                 = '0'

   LogArchiveFormat               = '%t_%s_%r.dbf'

   TopWaitEvents                   = '(monitor)'

Instance Status:

SUCCESS

DGMGRL>

 

Send Queue entries

This command shows all the log files that are not successfully not archived on the standby databases (On Primary)

DGMGRL> show database canada SendQEntries

PRIMARY_SEND_QUEUE

       STANDBY_NAME       STATUS     RESETLOGS_ID           THREAD             LOG_SEQ       TIME_GENERATED       TIME_COMPLETED   FIRST_CHANGE#     NEXT_CHANGE#       SIZE (KBs)

              india     ARCHIVED       854451579               1                   80 04/23/2015 22:00:40 04/24/2015 13:33:28         2355595         2386718           41995

                         CURRENT       854451579               1                   81 04/24/2015 13:33:28                               2386718                             18618

 

Receive Queue entries This command returns the archive log sequences which are received but not applied yet. (On Standby)

DGMGRL> show database india recvqentries

STANDBY_RECEIVE_QUEUE

             STATUS     RESETLOGS_ID           THREAD             LOG_SEQ       TIME_GENERATED      TIME_COMPLETED   FIRST_CHANGE#     NEXT_CHANGE#       SIZE (KBs)

 

Top Wait Events

Shows the top wait events of long waiting time of the instance, and it can be run either on primary or standby database.

DGMGRL> show database india topwaitevents

TOP SYSTEM WAIT EVENTS

               Event           Wait Time

   rdbms ipc message           968194150

SQL*Net message from client           226616830

       PX Idle Wait           203594395

     DIAG idle wait           101791332

         lreg timer             50899217

DGMGRL>

From 12c we can have Far Sync Instance and Data Guard broker can monitor even Far Sync instance, For more details on Far Sync consider checking the article.

DGMGRL> show far_sync canfar;

 Far Sync - canfar

 Transport Lag:     0 seconds (computed 1 second ago)

Instance(s):

   canfar

 Far Sync Status:

SUCCESS

 DGMGRL>

Redo Transport Status

It returns if any error status of edo transport for the statndby databases which are in ENABLE mode.

DGMGRL> show database canada LogXptStatus;

LOG TRANSPORT STATUS

PRIMARY_INSTANCE_NAME STANDBY_DATABASE_NAME               STATUS

               mcdb               india

DGMGRL>

Status Report It returns overall report of the database with oracle errors (Ex: ORA-12514) and also the severity code.

DGMGRL> show database canada statusreport

STATUS REPORT

       INSTANCE_NAME   SEVERITY ERROR_TEXT

DGMGRL>

If output is in ERROR status then actual troubleshooting comes into place and you have to view several monitorable properties and they are covered with the troubleshooting Data Guard part which follows. 

Test Case : How to deal with Data Guard Broker Errors

So far we are limited to monitoring but when something is gone wrong and resulted the output of the configuration with ERROR, then the real troubleshooting of Data Guard broker starts, In order to demonstrate Troubleshooting Data Guard using a broker I've included with a live example which makes you feel that managing Data Guard broker is easy, whereas many others considered Data Guard broker as lousy.

Usually the error outputs from the Data Guard broker may not straight forward as we see in alert log, So we will see how to deal with them. We can confirm whether the configuration is in good status or not by below command of DGMGRL and it may result with either warning or error depends on the issue, So if we see from below output the configuration status is in Warning status and we can find the errors associated with the Primary, Physical Standby databases.

 DGMGRL> show configuration;

 Configuration - ckpt12c

 Protection Mode: MaxAvailability

Databases:

canada - Primary database

   india - Physical standby database

     Warning: ORA-16792: configurable property value is inconsistent with database setting

 Fast-Start Failover: DISABLED

 Configuration Status:

WARNING

 DGMGRL>

 From the above output there are no errors or warnings from primary database and we can see warning from the standby database. Each DBA could follow a different path of troublshooting, Here in this example we will see how easy it is to diagnosis and fix the issues of Data Guard Broker. In my opinion the best utility to know the error details is "oerr" , we can know the description of error with oerr as follows.

 -bash-3.2$ oerr ora 16792

16792, 0000, "configurable property value is inconsistent with database setting"

// *Cause: The values of one or more configurable properties were

//         inconsistent with database in-memory settings or server parameter

//         file settings. This may happen by directly altering initialization

//         parameters instead of editing configurable property values using

//         Data Guard broker.

// *Action: Query the InconsistentProperties property on the database or check

//         the Data Guard broker log to find which properties are set

//         inconsistently. Reset these properties to make them consistent

//         with the database settings. Alternatively, enable the database

//        or the entire configuration to allow the configurable property

//         settings to be propagated to the initialization parameters.

-bash-3.2$


So from the error description it shows "Configurable property value is inconsistent with database settings" , from this we can sense the issue is around inconsistent properties. With the only output of configuration status it's not easy to identify and fix the things, we may need to review many things to confirm. After reviewing the configuration of whole Data Guard, now we have to review the status of the database which is in trouble, we don't need to review the status of other databases, because they are in good status.

 DGMGRL> show database india

 Database - india

 Role:             PHYSICAL STANDBY

Intended State:   APPLY-ON

Transport Lag:     0 seconds (computed 0 seconds ago)

Apply Lag:         0 seconds (computed 0 seconds ago)

Apply Rate:       8.99 MByte/s

Real Time Query:   ON

Instance(s):

   drmcdb

     Warning: ORA-16714: the value of property StandbyFileManagement is inconsistent with the database setting

 Database Status:

WARNING

 DGMGRL>

 -bash-3.2$ oerr ora 16714

16714, 0000, "the value of property %s is inconsistent with the database setting"

// *Cause: The value of the specified configuration property was inconsistent

//         with database in-memory settings or server parameter file

//         settings. This may be caused by changing an initialization

//         parameter that corresponds to a configuration property.

// *Action: Query the InconsistentProperties property on the database to

//         determine the which properties are set inconsistently. Reset the

//         properties to make them with the database settings.

-bash-3.2$

From the output of standby database, it's clear that issue is around the property StandbyFileManagement and its mismatching with the Database and the configuration file, There are several other commands of DGMGRL which can provide overall status report of the database i.e.

 DGMGRL> show database india statusreport

STATUS REPORT

       INSTANCE_NAME   SEVERITY ERROR_TEXT

             drmcdb   WARNING ORA-16714: the value of property StandbyFileManagement is inconsistent with the database setting

DGMGRL>

If there are multiple errors with property values then we can query for the "Inconsistentproperties" for the database as

DGMGRL> show database india inconsistentproperties

INCONSISTENT PROPERTIES

   INSTANCE_NAME       PROPERTY_NAME         MEMORY_VALUE         SPFILE_VALUE         BROKER_VALUE

         drmcdb StandbyFileManagement                 AUTO                 AUTO               MANUAL

DGMGRL>

 

If we want to review all the property values of standby then we can use "verbose" command associated with show database.

DGMGRL> show database verbose india

 Database - india

 Role:             PHYSICAL STANDBY

Intended State:   APPLY-ON

Transport Lag:     0 seconds (computed 0 seconds ago)

Apply Lag:         0 seconds (computed 0 seconds ago)

Apply Rate:       8.98 MByte/s

Real Time Query:   ON

Instance(s):

   drmcdb

     Warning: ORA-16714: the value of property StandbyFileManagement is inconsistent with the database setting

Properties:

   DGConnectIdentifier             = 'india'

   ObserverConnectIdentifier       = ''

   LogXptMode                     = 'sync'

   RedoRoutes                     = ''

   DelayMins                       = '0'

   Binding                         = 'optional'

   MaxFailure                     = '0'

   MaxConnections                 = '1'

   ReopenSecs                     = '300'

   NetTimeout                     = '30'

   RedoCompression                 = 'DISABLE'

   LogShipping                     = 'ON'

   PreferredApplyInstance         = ''

   ApplyInstanceTimeout           = '0'

   ApplyLagThreshold               = '0'

   TransportLagThreshold           = '0'

   TransportDisconnectedThreshold = '30'

   ApplyParallel                   = 'AUTO'

   StandbyFileManagement           = 'MANUAL'

   ArchiveLagTarget               = '0'

   LogArchiveMaxProcesses         = '4'

   LogArchiveMinSucceedDest       = '1'

   DbFileNameConvert              = ''

   LogFileNameConvert             = ''

   FastStartFailoverTarget         = ''

   InconsistentProperties         = '(monitor)'

   InconsistentLogXptProps         = '(monitor)'

   SendQEntries                   = '(monitor)'

   LogXptStatus                   = '(monitor)'

   RecvQEntries                   = '(monitor)'

   StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=CKPT-ORA-04(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=INDIA_DGMGRL)(INSTANCE_NAME=drmcdb)(SERVER=DEDICATED)))'

   StandbyArchiveLocation         = 'USE_DB_RECOVERY_FILE_DEST'

   AlternateLocation               = ''

   LogArchiveTrace                 = '0'

   LogArchiveFormat               = '%t_%s_%r.dbf'

   TopWaitEvents                   = '(monitor)'

Database Status:

WARNING

DGMGRL>

 

Up to here, we are in conclusion that issue on standby is around "StandbyFileManagement" property value, We have fetched most of the information by using DGMGRL , apart from the commands we can also consult the broker log files and we can get some more information.

From Primary Broker log file

(as i mentioned above the broker log files will be located in the diag destination where the alert log is located.)

Primary completed health check

Data Guard Broker Status Summary:

Type                       Name                             Severity Status

Configuration               ckpt12c                           Warning ORA-16608

Primary Database           canada                            Success ORA-00000

Physical Standby Database   india                             Warning ORA-16792

From the primary database the error code is "ORA-00000" which is in success status and the Standby database error code is "ORA-16792" which points to same output of "show configuration"

 From Standby Broker Log file

By standby broker log file it is crystal clear that "StandbyFileManagement" has property value "MANUAL" in configuration file/metadata and where the value is "AUTO" at database level and the SPFILE we are using.

 

In order to confirm what we are viewing from the Broker log file, we can crosscheck the parameter output from the database level and the Broker level as

 From Broker Interface

 DGMGRL> show database india StandbyFileManagement;

StandbyFileManagement = 'MANUAL'

DGMGRL>

 From Database

 SQL> show parameter standby_file_management

 NAME                                 TYPE       VALUE

------------------------------------ ----------- ------------------------------

standby_file_management             string     AUTO

SQL>

It's now clear that parameter value differs from metadata and from the database level, So we need to fix this parameter to meet the configuration status to "SUCCESS".

DGMGRL> edit database india set property StandbyFileManagement = AUTO;

Property "standbyfilemanagement" updated

DGMGRL>

DGMGRL> show database india StandbyFileManagement;

StandbyFileManagement = 'auto'

DGMGRL>

 After updating the property StandbyFileManagement , now we can check is there still any inconsistent properties with the database.

DGMGRL> show database india inconsistentproperties

INCONSISTENT PROPERTIES

   INSTANCE_NAME       PROPERTY_NAME         MEMORY_VALUE         SPFILE_VALUE         BROKER_VALUE

DGMGRL>

 There are no more inconsistent parameters from the standby after fixing the parameter "StandbyFileManagement", Finally we can check the configuration status and database status.

 DGMGRL> show configuration;

 Configuration - ckpt12c

 Protection Mode: MaxAvailability

Databases:

canada - Primary database

   india - Physical standby database

 Fast-Start Failover: DISABLED

 Configuration Status:

SUCCESS

 DGMGRL>

 

DGMGRL> show database india ;

 Database - india

 Role:             PHYSICAL STANDBY

Intended State:   APPLY-ON

Transport Lag:     0 seconds (computed 0 seconds ago)

Apply Lag:         0 seconds (computed 0 seconds ago)

Apply Rate:       8.91 MByte/s

Real Time Query:   ON

Instance(s):

   drmcdb

 Database Status:

SUCCESS

 DGMGRL>

Now we cannot see any more errors either from configuration level or from database, This is an example and we can come accros many errors/warnings related to configuration and hence depending on the requirement(sync issues, configuration issues) we may have to consider disable/enable configuration, disable/enable database and even more disable/enable broker.

Conclusion

Managing Data Guard with broker looks not easy to manage, but if we handle broker in a simplified and correct path its more easy to fix the issue and DGMGRL makes life so easy in monitoring the configuration of the Data Guard , databases and more easy to perform role transitions. There is no surprise that DGMGRL can do all operations what we can do using Generic method.

References:

http://docs.oracle.com/cd/B28359_01/server.111/b28295/dbpropref.htm#i105769