P.O.I.S.E.D. is the acronym I invented for the performance tuning method described by Chris Lawson in The Art and Science of Oracle Performance Tuning; it stands for Problem, Observation, Interpretation, Solution, Execution, and Documentation.

P.O.I.S.E.D. is not just a performance tuning method; it applies to any problem that must be solved. Every problem-solving exercise has these six stages, whether the exercise takes six minutes or six weeks. You may scoff but I’m a great believer in articulating principles; if I cannot articulate my principles, what is the likelihood of my observing them? It might lead to situations like this.

Emergency room nurse on the midnight shift:

We just admitted a man who is complaining that his head is hurting.

Emergency room intern:

Tell him to take a Tylenol and come back in the morning if his head is still hurting.

Emergency room nurse:

Should I be concerned about the knife sticking out of his head?

Or this:

Developer:

Some reports are slow after the database upgrade.

Database administrator:

Let me check for chained rows.

The Observation and Interpretation phases feed each other. The type of problem dictates the information that should be collected in the Observation phase. For example, if the problem is systemwide performance degradation, an AWR or “hanganalyze” report may be appropriate. The Interpretation phase concludes with a determination of root cause. From a management perspective this is a very good place to be even though a solution has not yet been constructed. Absent a determination of root cause, there is uncertainty; something that management abhors. Some problems are not solved easily and sometimes it becomes necessary to backtrack to the previous step as shown in the diagram.

Example

Problem

The ETL batch job does not always complete on time. The SLA is violated and the business users cannot view their dashboards and run their reports beginning at the agreed-upon time.

Observation

The following diagnostic reports were available:

  • Log files (showing the start and stop times for each step of the ETL job) for a good run and a bad run.
  • AWR reports for a good run and a bad run

Interpretation

During the bad run, the log file clearly showed that one step took much longer than usual. The AWR report showed long-running “INSERT INTO … SELECT FROM” statements clearly corresponding in length to each step in the batch job. A SQLT report was requisitioned for the offending SQL statement. The SQLT report clearly showed that the offending SQL statement had run longer than usual during the bad run. The query plans were not the same from day to day. However, most of the time, there was little variation in the execution time. Only on a single occasion had the execution time been dramatically different. When questioned, the developers said that the volume of data and its distribution changed only slightly from day to day.

Solution

Changes in performance can have many causes. However, in this case, there was enough evidence to conclude that changing query plans were the culprit whenever the ETL job took longer than usual. Also, it was clear that it was not the same SQL statements that was affected each time. The proposed solution was to freeze the query plans of all the long-running “INSERT INTO … SELECT FROM” comprising the batch job. Two options were considered: “custom SQL profiles” such as created by the script coe_xfr_sql_profile.sql (available on M.O.S.) and SQL Plan Management (SPM) baselines.

Execution

SPM baselines were chosen. To track whether the solution was working, a “select SYSDATE, […] from v$sql where sql_id in (…)” query was implemented to record the plan hash values every time the batch job was run.

Documentation

You’re reading it!