Hello, you are not logged in.  Login or sign up
Experts >> Steven Feuerstein's PL/SQL Obsession >> Quick and Useful Tips (Qusefuls) >> Quseful #9: Refactoring
Search Toad World Search
Quseful #9: Refactoring
 

All files referenced in document available from: www.oracleplsqlprogramming.com/downloads/demo.zip.

Introduction

Very, very few of us write perfect programs the first time, or the second time, or…. You get the idea. Our code is never perfect and can always be improved. Martin Fowler developed a technique he calls "refactoring," and it has become quite popular in the world of Java. Here is Mr. Fowler's description of refactoring:

"Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.

"It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of the code after it has been written."

To demonstrate this approach in my trainings, I put together the following document and associated code. I start with a very poorly designed program to compare two files for equality. I then build a regression test for that program (a critical step in effective refactoring) using Quest Code Tester. With that test in place, I go through a multi-step transformation of this program until it actually works and it has a much better structure.

The explanations for each stage in the transformation are, I admit, a bit sketchy. Perhaps you will have the opportunity to attend one of my trainings in which I will do a full presentation of this process.  I encourage you, however, to start with the initial implementation and…

  • Critique it: what's wrong with it? What could be improved?
  • Go through my various iterations and make sure you understand what I did – and perhaps even do it yourself.
  • Or maybe follow your own progression from ugly code to perfect code (or at least, much better code).
  • Give me feedback on my steps and my changes. Do you disagree with any of them? Do you have a suggestion for what I could do better?

Before – the First, Quick and Dirty Implementation

I just started a new job as a PL/SQL developer. For my first task, I was told to "fix" the files_are_equal function. The person who I replaced had written it just before leaving. It is supposed to check to see whether two files contain the same contents. There was very little documentation about the program, but I was told that I should assume the following:

  • The program compares the contents of text files only, not binary data.
  • The directories and file names passed to the program cannot be null.
  • The files may not exist at all and the files can be empty.
  • If the second directory is NULL, then assume both files are located in the same directory.
  • The each line in the files is NULL, then they will be considered to be the same (that is, NULL= NULL).

The code I inherited

I took a look at the code and quickly realized that "quick and dirty" was probably an overstatement for all the thought that had gone into writing this program. Here is a rough summary (perhaps "intention" would be a better world to use) of the logic I found in this function:

  • Pass in the "check this" and "against this" file information (file name and directory).
  • Open both files, in read-only mode.
  • Read the next line from each file.
  • If they match, then go on to the next line. If they do not match, then stop.
  • Close both files and return the result.
  • If any error occurs, return FALSE.

Quick and Dirty: the starting point for refactoring (38 lines) 

I start with a very poorly designed program to compare two files for equality. I then build a regression test for that program (a critical step in effective refactoring) using Quest Code Tester. With that test in place, I go through a multi-step transformation of this program until it actually works and it has a much better structure.

CREATE OR REPLACE FUNCTION files_are_equal (
   file1_name_in   IN   VARCHAR2,
   dir1_name_in    IN   VARCHAR2,
   file2_name_in   IN   VARCHAR2,
   dir2_name_in    IN   VARCHAR2 := NULL
)
   RETURN BOOLEAN
IS
   v_file1id         UTL_FILE.file_type;
   v_file1line       VARCHAR2 (32767);
--
   l_file2id         UTL_FILE.file_type;
   l_file2line       VARCHAR2 (32767);
--
   identical_files   BOOLEAN            DEFAULT TRUE;
BEGIN
   v_file1id := UTL_FILE.fopen (dir1_name_in, file1_name_in, 'R', 32767);
   l_file2id :=
      UTL_FILE.fopen (NVL (dir2_name_in, dir1_name_in),
                      file2_name_in,
                      'R',
                      32767
                     );
   LOOP
      UTL_FILE.get_line (v_file1id, v_file1line);
      UTL_FILE.get_line (l_file2id, l_file2line);
      identical_files := v_file1line = l_file2line;
   END LOOP;
   UTL_FILE.fclose (v_file1id);
   UTL_FILE.fclose (l_file2id);
   RETURN identical_files;
EXCEPTION
   WHEN OTHERS
   THEN
      RETURN FALSE;
END files_are_equal;

Critique of This Very Quick and Dirty Program

Here are some of the things that bothered me about this program:

  • There is no program header describing the author, copyright, modification history, general description, etc. It is an "anonymous" program.
  • Inconsistent naming conventions for local variables. Some start with "l_" (which is, for me at least, the preferred convention) and others do not.
  • Lots of hard-coded values: length of the line strings, the specification of the maximum length in the calls to FOPEN, the mode under which the file is opened.
  • Take a look at that loop: it appears to be an infinite loop. It doesn't contain any sort of EXIT statement or boundary condition to terminate the loop. Sure, the loop will stop when UTL_FILE.GET_LINE raises NO_DATA_FOUND, but really that is a scary-looking piece of code.
  • The comparison logic is simplistic and buggy. At first glance, it makes some kind of sense, but delve deeper and this procedure is deeply flawed. In fact, this function will never return TRUE when the two files are the same. Can you see why?
  • There are no comments to explain the basic logic and, in particular, the thinking behind the approach taken to making the comparison.
  • Does it really make sense to simply return FALSE if any error has occurred? Certainly you could argue that two files cannot possibly be equal if an error occurred in the comparison, but usually you would not want to hide the fact that an error occurred. And one would at least distinguish between the NO_DATA_FOUND exception (raised by the UTL_FILE.GET_LINE procedure reading past the last line) and other, "real" errors.
  • And if one those more serious errors does occur, I do not close the files. This might leave one or both files open, which could cause other problems in my application.

WHAT OTHER ISSUES CAN YOU FIND?

The Refactoring Sequence

Major changes

Name of file

The original code for the files_are_equal implementation

eqfiles_before_ref.sf

Establish a baseline regression test, so I can analyze the impact of my changes. I offer the following:

  • Export of the Quest Code Tester test definition for files_are_equal. 
  • Procedure to run the Code Tester test in "batch" mode. 
  • Command line script to test each iteration of refactoring.

Note: Quest Code Tester 1.6.1 and above is required.

q##files_are_equal.qut

q##files_are_equal.sp

eqfiles.tst

Create helper subprograms: initialize, cleanup and hide logic for comparison (it is more complicated than appears at first glance). Replace direct call to GET_LINE with a program that "hides" the NO_DATA_FOUND. Hmmm, that means we can no longer rely on NO_DATA_FOUND to stop the loop execution.

!! Perhaps we should build in an emergency bail-out in case of infinite loops?

eqfiles_helper_programs.sf

loop_killer.pkg

Get rid of hard-coded values. Best to create a container for UTL_FILE constants.

eqfiles_no_literals.sf

utl_file_constants.pkg

Address fundamental logic problems in the "check for equality" algorithm. Also, recognizing that one file could be shorter than the other, we should probably deal separate with the questions "Are the files the same? and "Should I stop the loop?"

So the check_for_equality gets more inputs and gets much more complicated.

We should not leave this stage until we get 100% success (green light) from our regression test.

eqfiles_real_comparison.sf

Bullet proof the code: now that the code seems to be working, let's think about validating assumptions, generally taking care of unusual scenarios, and improve error handling.

  • Assertions for inputs
  • Don't return FALSE for any exception. In fact at this point, NO_DATA_FOUND should not be raised by the program, so we may want to separate out this case.

Now we see the need for another helper package of PL/SQL limitations.

eqfiles_bullet_proof.sf

eqfiles_with_qem.sf

plsql_limits.pks

Final cleanup:

  • Consolidate multiple variables into single records, make the parameter passing more concise and consistent. 
  • Add program header

eqfiles_moving_parts.sf

Reorg code into helper programs - eqfiles_helper_programs.sf

CREATE OR REPLACE FUNCTION files_are_equal (
   file1_name_in   IN   VARCHAR2,
   dir1_name_in    IN   VARCHAR2,
   file2_name_in   IN   VARCHAR2,
   dir2_name_in    IN   VARCHAR2 := NULL
)
   RETURN BOOLEAN
IS
   l_file1id     UTL_FILE.file_type;
   l_file1line   VARCHAR2 (32767);
   l_file1eof    BOOLEAN;
--
   l_file2id     UTL_FILE.file_type;
   l_file2line   VARCHAR2 (32767);
   l_file2eof    BOOLEAN;
--
   l_identical   BOOLEAN            DEFAULT TRUE;
   PROCEDURE initialize
   IS
   BEGIN
      l_file1id := UTL_FILE.fopen (dir1_name_in, file1_name_in, 'R', 32767);
      l_file2id :=
         UTL_FILE.fopen (NVL (dir2_name_in, dir1_name_in),
                         file2_name_in,
                         'R',
                         32767
                        );
   END initialize;
   PROCEDURE cleanup
   IS
   BEGIN
      UTL_FILE.fclose (l_file1id);
      UTL_FILE.fclose (l_file2id);
   END cleanup;
/*
Avoid direct call to GET_LINE becausee it leads to poorly
structured code (application logic in the exception section).
Instead, trap the NO_DATA_FOUND exception and return a Boolean flag.
*/
   PROCEDURE get_next_line_from_file (
      file_in    IN       UTL_FILE.file_type,
      line_out   OUT      VARCHAR2,
      eof_out    OUT      BOOLEAN
   )
   IS
   BEGIN
      UTL_FILE.get_line (file_in, line_out);
      eof_out := FALSE;
   EXCEPTION
      WHEN NO_DATA_FOUND
      THEN
         line_out := NULL;
         eof_out := TRUE;
   END get_next_line_from_file;
   PROCEDURE check_for_equality (
      line1_in        IN       VARCHAR2,
      line2_in        IN       VARCHAR2,
      identical_out   OUT      BOOLEAN
   )
   IS
   BEGIN
      identical_out := line1_in = line2_in;
   END check_for_equality;
BEGIN
   initialize;
   WHILE (l_identical AND NOT l_file1eof AND NOT l_file2eof)
   LOOP
      get_next_line_from_file (l_file1id, l_file1line, l_file1eof);
      get_next_line_from_file (l_file2id, l_file2line, l_file2eof);
      check_for_equality (l_file1line, l_file2line, l_identical);
   END LOOP;
   cleanup;
   RETURN l_identical;
EXCEPTION
   WHEN OTHERS
   THEN
      cleanup;
      RETURN FALSE;
END files_are_equal;
Remove hard-codings - eqfiles_no_literals.sf
CREATE OR REPLACE FUNCTION files_are_equal (
   file1_name_in IN VARCHAR2
 , dir1_name_in IN VARCHAR2
 , file2_name_in IN VARCHAR2
 , dir2_name_in IN VARCHAR2 := NULL
)
   RETURN BOOLEAN
IS
   l_file1id     UTL_FILE.file_type;
   l_file1line   utl_file_constants.max_linesize_t;
   l_file1eof    BOOLEAN;
   --
   l_file2id     UTL_FILE.file_type;
   l_file2line   utl_file_constants.max_linesize_t;
   l_file2eof    BOOLEAN;
   --
   l_identical   BOOLEAN                           DEFAULT TRUE;
   PROCEDURE initialize
   IS
   BEGIN
      l_file1id :=
         UTL_FILE.fopen (dir1_name_in
                       , file1_name_in
                       , utl_file_constants.read_only ()
                       , utl_file_constants.max_linesize ()
                        );
      l_file2id :=
         UTL_FILE.fopen (NVL (dir2_name_in, dir1_name_in)
                       , file2_name_in
                       , utl_file_constants.read_only ()
                       , utl_file_constants.max_linesize ()
                        );
   END initialize;

CREATE OR REPLACE PACKAGE utl_file_constants
IS
   SUBTYPE max_linesize_t IS VARCHAR2 (32767);
   SUBTYPE def_linesize_t IS VARCHAR2 (1024);
   FUNCTION read_only
      RETURN VARCHAR2;
   FUNCTION write_only
      RETURN VARCHAR2;
   FUNCTION append
      RETURN VARCHAR2;
   FUNCTION min_linesize
      RETURN PLS_INTEGER;
   FUNCTION max_linesize
      RETURN PLS_INTEGER;
   FUNCTION def_linesize
      RETURN PLS_INTEGER;
END utl_file_constants;

Fix problems in comparison logic - eqfiles_moving_parts.sf

/*
   Isolate the comparison logic into a single procedure.
   Return flags indicating whether or not to continue
   reading from the file and if the two are still identical
*/
PROCEDURE check_for_equality (
   file1_line_in   IN       VARCHAR2,
   file2_line_in   IN       VARCHAR2,
   l_file1eof_in   IN       BOOLEAN,
   l_file2eof_in   IN       BOOLEAN,
   identical_out   OUT      BOOLEAN,
   read_next_out   OUT      BOOLEAN
)
IS
BEGIN
   IF l_file1eof_in AND l_file2eof_in
   THEN
      /* Made it to the end of both files simultaneously. That's good news! */
      identical_out := TRUE;
      read_next_out := FALSE;
   ELSIF l_file1eof_in OR l_file2eof_in
   THEN
      /* Reached end of one before the other. Not identical! */
      identical_out := FALSE;
      read_next_out := FALSE;
   ELSE
      /*
      Only continue IF the two lines are identical.
      And if they are both null/empty, consider them to be equal.
      */
      identical_out :=
            file1_line_in = file2_line_in
         OR (l_file1eof_in IS NULL AND l_file2eof_in IS NULL);
      read_next_out := identical_out;
   END IF;
END check_for_equality;
BEGIN
   initialize;
   WHILE (l_keep_checking)
   LOOP
      get_next_line (l_file1id, l_file1line, l_file1eof);
      get_next_line (l_file2id, l_file2line, l_file2eof);
      check_for_equality (l_file1line
                        , l_file2line
                        , l_file1eof
                        , l_file2eof
                        , l_identical
                        , l_keep_checking
                         );
   END LOOP;
   cleanup;
   RETURN l_identical;

Bullet-proof the code (assertions, exception mgt) - eqfiles_bullet_proof.sf

/*
   Very simple generic assertion program to increase the likelihood
   that I will actually file1 to make sure assumptions are being followed.
*/
PROCEDURE assert (condition_in IN BOOLEAN, msg_in IN VARCHAR2)
IS
BEGIN
   IF NOT condition_in OR condition_in IS NULL
   THEN
      raise_application_error (-20000, msg_in);
   END IF;
END assert;
   /*
   Consolidate all initialization logic:
     - Validate all assumptions regarding inputs.
     - Open the files.
   */
PROCEDURE initialize
IS
BEGIN
   /*
   Make sure inputs are valid.
   */
   assert (dir1_name_in IS NOT NULL, 'Directory cannot be NULL.');
   assert (file1_name_in IS NOT NULL, 'File name cannot be NULL.');
   assert (file2_name_in IS NOT NULL, 'File name cannot be NULL.');
   /*
   Open both files, read-only.
   */
   l_file1id :=
      UTL_FILE.fopen (dir1_name_in,
                      file1_name_in,
                      utl_file_constants.read_only (),
                      utl_file_constants.max_linesize ()
                     );
   l_file2id :=
      UTL_FILE.fopen (NVL (dir2_name_in, dir1_name_in),
                      file2_name_in,
                      utl_file_constants.read_only (),
                      utl_file_constants.max_linesize ()
                     );
END initialize;
PROCEDURE cleanup
   IS
   BEGIN
      /* Close any files that are still open. */
      IF UTL_FILE.is_open (l_file1id)
      THEN
         UTL_FILE.fclose (l_file1id);
      END IF;
      IF UTL_FILE.is_open (l_file2id)
      THEN
         UTL_FILE.fclose (l_file2id);
      END IF;

Bullet-proof II – use of q$error_manager - eqfiles_with_qemsf

PROCEDURE cleanup (sqlcode_in IN PLS_INTEGER DEFAULT plsql_limits.c_no_error)
IS
BEGIN
   /* Close any files that are still open. */
   IF UTL_FILE.is_open (l_file1id)
   THEN
      UTL_FILE.fclose (l_file1id);
   END IF;
   IF UTL_FILE.is_open (l_file2id)
   THEN
      UTL_FILE.fclose (l_file2id);
   END IF;
    /* If I have an error, then log the information and raise it back
      out of the function.
      I am using the Quest Error Manager freeware utility below.
      www.oracleplsqlprogramming.com/downloads/qem.zip
   */
   IF sqlcode_in <> plsql_limits.c_no_error
   THEN
      q$error_manager.raise_error
         (error_name_in      => 'UNANTICIPATED-ERROR',
          text_in            => 'Unexpected error when attempting to compare two files for equality.',
          name1_in           => 'FILE1 DIR',
          value1_in          => dir1_name_in,
          name2_in           => 'FILE2 DIR',
          value2_in          => dir2_name_in,
          name3_in           => 'FILE1 NAME',
          value3_in          => file1_name_in,
          name4_in           => 'FILE2 FILE',
          value4_in          => file2_name_in
         );
   END IF;
END cleanup;

Reduce moving parts, add program header - eqfiles_moving_parts.sf  ("Final" version - some comments removed to use less paper!)

CREATE OR REPLACE FUNCTION files_are_equal 
/*
| File name: eqfiles_after_ref.sql
|
| Overview: Compare two files to see if they have the same contents.
|           Note: If no "file2 this" directory, then we use the
|           same directory as "file1 this".
|
| Author(s): Steven Feuerstein
|
| Modification History:
|   Date          Who         What
|   19-AUG-2007   SF          Refactored program for PL/SQL Mosaic course
|   23-SEP-2005   SF          Created program (see eqfiles_before_ref.sf)
*/
(
   file1_name_in   IN   VARCHAR2,
   dir1_name_in    IN   VARCHAR2,
   file2_name_in   IN   VARCHAR2,
   dir2_name_in    IN   VARCHAR2 := NULL
)
   RETURN BOOLEAN
IS
   TYPE file_info_rt IS RECORD (
      file_id     UTL_FILE.file_type,
      next_line   utl_file_constants.max_linesize_t,
      eof         BOOLEAN
   );
   l_file1           file_info_rt;
   l_file2           file_info_rt;
   --
   l_keep_checking   BOOLEAN      DEFAULT TRUE;
   l_identical       BOOLEAN      DEFAULT FALSE;
   PROCEDURE assert (condition_in IN BOOLEAN, msg_in IN VARCHAR2)
   IS
   BEGIN
      IF NOT condition_in OR condition_in IS NULL
      THEN
         raise_application_error (-20000, msg_in);
      END IF;
   END assert;
   PROCEDURE initialize (file1_out OUT file_info_rt, file2_out OUT file_info_rt)
   IS
   BEGIN
      /* Make sure inputs are valid. */
      assert (dir1_name_in IS NOT NULL, 'Directory cannot be NULL.');
      assert (file1_name_in IS NOT NULL, 'File name cannot be NULL.');
      assert (file2_name_in IS NOT NULL, 'File name cannot be NULL.');
      /* Open both files, read-only. */
      file1_out.file_id :=
         UTL_FILE.fopen (LOCATION          => dir1_name_in,
                         filename          => file1_name_in,
                         open_mode         => utl_file_constants.read_only (),
                         max_linesize      => utl_file_constants.max_linesize
                                                                           ()
                        );
      file2_out.file_id :=
         UTL_FILE.fopen (LOCATION          => NVL (dir2_name_in, dir1_name_in),
                         filename          => file2_name_in,
                         open_mode         => utl_file_constants.read_only (),
                         max_linesize      => utl_file_constants.max_linesize
                                                                           ()
                        );
   END initialize;
   PROCEDURE get_next_line_from_file (file_inout IN OUT file_info_rt)
   IS
   BEGIN
      UTL_FILE.get_line (file_inout.file_id, file_inout.next_line);
      file_inout.eof := FALSE;
   EXCEPTION
      WHEN NO_DATA_FOUND
      THEN
         file_inout.eof := TRUE;
   END get_next_line_from_file;
   PROCEDURE check_for_equality (
      file1_in        IN       file_info_rt,
      file2_in        IN       file_info_rt,
      identical_out   OUT      BOOLEAN,
      read_next_out   OUT      BOOLEAN
   )
   IS
   BEGIN
      IF file1_in.eof AND file2_in.eof
      THEN
         /* Made it to the end of both files simultaneously. That's good news! */
         identical_out := TRUE;
         read_next_out := FALSE;
      ELSIF file1_in.eof OR file2_in.eof
      THEN
         /* Reached end of one before the other. Not identical! */
         identical_out := FALSE;
         read_next_out := FALSE;
      ELSE
         /*
         Only continue IF the two lines are identical.
         And if they are both null/empty, consider them to be equal.
         */
         identical_out :=
               file1_in.next_line = file2_in.next_line
            OR (file1_in.next_line IS NULL AND file2_in.next_line IS NULL);
         read_next_out := identical_out;
      END IF;
   END check_for_equality;
   PROCEDURE cleanup (sqlcode_in IN PLS_INTEGER
            DEFAULT plsql_limits.c_no_error)
   IS
   BEGIN
      /* Close any files that are still open. */
      IF UTL_FILE.is_open (l_file1.file_id)
      THEN
         UTL_FILE.fclose (l_file1.file_id);
      END IF;
      IF UTL_FILE.is_open (l_file2.file_id)
      THEN
         UTL_FILE.fclose (l_file2.file_id);
      END IF;
      /* If I have an error, then log the information and raise it back
         out of the function.
         I am using the Quest Error Manager freeware utility below.
         www.oracleplsqlprogramming.com/downloads/qem.zip
      */
      IF sqlcode_in <> plsql_limits.c_no_error
      THEN
         q$error_manager.raise_error
            (error_name_in      => 'UNANTICIPATED-ERROR',
             text_in            => 'Unexpected error when attempting to compare two files for equality.',
             name1_in           => 'FILE1 DIR',
             value1_in          => dir1_name_in,
             name2_in           => 'FILE2 DIR',
             value2_in          => dir2_name_in,
             name3_in           => 'FILE1 NAME',
             value3_in          => file1_name_in,
             name4_in           => 'FILE2 FILE',
             value4_in          => file2_name_in
            );
      END IF;
   END cleanup;
BEGIN
   initialize (l_file1, l_file2);
   WHILE (l_keep_checking)
   LOOP
      get_next_line_from_file (l_file1);
      get_next_line_from_file (l_file2);
      check_for_equality (l_file1, l_file2, l_identical, l_keep_checking);
   END LOOP;
   cleanup;
   RETURN l_identical;
EXCEPTION
   WHEN OTHERS
   THEN
      cleanup (SQLCODE);
      RAISE;
END files_are_equal;
/

 

Copyright 2008 by Quest Software  | Terms Of Use | Privacy Statement | Contact Us