All files referenced in document available from: www.oracleplsqlprogramming.com/downloads/demo.zip.
Introduction
Very, very few of us write perfect programs the first time, or the second time, or…. You get the idea. Our code is never perfect and can always be improved. Martin Fowler developed a technique he calls "refactoring," and it has become quite popular in the world of Java. Here is Mr. Fowler's description of refactoring:
"Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.
"It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of the code after it has been written."
To demonstrate this approach in my trainings, I put together the following document and associated code (listed on page 4 and downloadable from here). I start with a very poorly designed program to compare two files for equality. I then build a regression test for that program (a critical step in effective refactoring) using Quest Code Tester. With that test in place, I go through a multi-step transformation of this program until it actually works and it has a much better structure.
The explanations for each stage in the transformation are, I admit, a bit sketchy. Perhaps you will have the opportunity to attend one of my trainings in which I will do a full presentation of this process.
I encourage you, however, to start with the initial implementation and…
-
Critique it: what's wrong with it? What could be improved?
-
Go through my various iterations and make sure you understand what I did – and perhaps even do it yourself.
-
Or maybe follow your own progression from ugly code to perfect code (or at least, much better code).
-
Give me feedback on my steps and my changes. Do you disagree with any of them? Do you have a suggestion for what I could do better?
Before – the First, Quick and Dirty Implementation (38 lines)
I just started a new job as a PL/SQL developer. For my first task, I was told to "fix" the files_are_equal function. The person who I replaced had written it just before leaving. It is supposed to check to see whether two files contain the same contents. There was very little documentation about the program, but I was told that I should assume the following:
-
The program compares the contents of text files only, not binary data.
-
The directories and file names passed to the program cannot be null.
-
The files may not exist at all and the files can be empty.
-
If the second directory is NULL, then assume both files are located in the same directory.
-
The each line in the files is NULL, then they will be considered to be the same (that is, NULL= NULL).
The code I inherited
I took a look at the code and quickly realized that "quick and dirty" was probably an overstatement for all the thought that had gone into writing this program. Here is a rough summary (perhaps "intention" would be a better world to use) of the logic I found in this function:
-
Pass in the "check this" and "against this" file information (file name and directory).
-
Open both files, in read-only mode.
-
Read the next line from each file.
-
If they match, then go on to the next line. If they do not match, then stop.
-
Close both files and return the result.
-
If any error occurs, return FALSE.
Quick and Dirty: the starting point for refactoring
CREATE OR REPLACE FUNCTION files_are_equal (
file1_name_in IN VARCHAR2
, dir1_name_in IN VARCHAR2
, file2_name_in IN VARCHAR2
, dir2_name_in IN VARCHAR2 := NULL
)
RETURN BOOLEAN
IS
v_file1id UTL_FILE.file_type;
v_file1line VARCHAR2 (32767);
--
l_file2id UTL_FILE.file_type;
l_file2line VARCHAR2 (32767);
--
identical_files BOOLEAN DEFAULT TRUE;
BEGIN
v_file1id := UTL_FILE.fopen (dir1_name_in, file1_name_in, 'R', 32767);
l_file2id :=
UTL_FILE.fopen (NVL (dir2_name_in, dir1_name_in)
, file2_name_in, 'R', 32767);
LOOP
UTL_FILE.get_line (v_file1id, v_file1line);
UTL_FILE.get_line (l_file2id, l_file2line);
identical_files := v_file1line = l_file2line;
END LOOP;
UTL_FILE.fclose (v_file1id);
UTL_FILE.fclose (l_file2id);
RETURN identical_files;
EXCEPTION
WHEN OTHERS
THEN
RETURN FALSE;
END files_are_equal;
Critique of This Very Quick and Dirty Program
Here are some of the things that bothered me about this program:
-
There is no program header describing the author, copyright, modification history, general description, etc. It is an "anonymous" program.
-
Inconsistent naming conventions for local variables. Some start with "l_" (which is, for me at least, the preferred convention) and others do not.
-
Lots of hard-coded values: length of the line strings, the specification of the maximum length in the calls to FOPEN, the mode under which the file is opened.
-
Take a look at that loop: it appears to be an infinite loop. It doesn't contain any sort of EXIT statement or boundary condition to terminate the loop. Sure, the loop will stop when UTL_FILE.GET_LINE raises NO_DATA_FOUND, but really that is a scary-looking piece of code.
-
The comparison logic is simplistic and buggy. At first glance, it makes some kind of sense, but delve deeper and this procedure is deeply flawed. In fact, this function will never return TRUE when the two files are the same. Can you see why?
-
There are no comments to explain the basic logic and, in particular, the thinking behind the approach taken to making the comparison.
-
Does it really make sense to simply return FALSE if any error has occurred? Certainly you could argue that two files cannot possibly be equal if an error occurred in the comparison, but usually you would not want to hide the fact that an error occurred. And one would at least distinguish between the NO_DATA_FOUND exception (raised by the UTL_FILE.GET_LINE procedure reading past the last line) and other, "real" errors.
-
And if one those more serious errors does occur, I do not close the files. This might leave one or both files open, which could cause other problems in my application.
WHAT OTHER ISSUES CAN YOU FIND?
The Refactoring Sequence
|
Major changes
|
Name of file
|
|
The original code for the files_are_equal implementation
|
eqfiles_before_ref.sf
|
|
Establish a baseline regression test, so I can analyze the impact of my changes. I offer the following:
· Export of the Quest Code Tester test definition for files_are_equal.
· Procedure to run the Code Tester test in "batch" mode.
· Command line script to test each iteration of refactoring.
Note: Quest Code Tester 1.6.1 and above is required.
|
q##files_are_equal.qut
q##files_are_equal.sp
eqfiles.tst
|
|
Create helper subprograms: initialize, cleanup and hide logic for comparison (it is more complicated than appears at first glance). Replace direct call to GET_LINE with a program that "hides" the NO_DATA_FOUND. Hmmm, that means we can no longer rely on NO_DATA_FOUND to stop the loop execution.
!! Perhaps we should build in an emergency bail-out in case of infinite loops?
|
eqfiles_helper_programs.sf
loop_killer.pkg
|
|
Get rid of hard-coded values. Best to create a container for UTL_FILE constants.
|
eqfiles_no_literals.sf
utl_file_constants.pkg
|
|
Address fundamental logic problems in the "check for equality" algorithm. Also, recognizing that one file could be shorter than the other, we should probably deal separate with the questions "Are the files the same? and "Should I stop the loop?"
So the check_for_equality gets more inputs and gets much more complicated.
We should not leave this stage until we get 100% success (green light) from our regression test.
|
eqfiles_real_comparison.sf
|
|
Bullet proof the code: now that the code seems to be working, let's think about validating assumptions, generally taking care of unusual scenarios, and improve error handling.
· Assertions for inputs
· Don't return FALSE for any exception. In fact at this point, NO_DATA_FOUND should not be raised by the program, so we may want to separate out this case.
Now we see the need for another helper package of PL/SQL limitations.
|
eqfiles_bullet_proof.sf
eqfiles_with_qem.sf
plsql_limits.pks
|
|
Final cleanup:
· Consolidate multiple variables into single records, make the parameter passing more concise and consistent.
· Add program header
|
eqfiles_moving_parts.sf
|
Reorg code into helper programs - eqfiles_helper_programs.sf
CREATE OR REPLACE FUNCTION files_are_equal (
file1_name_in IN VARCHAR2
, dir1_name_in IN VARCHAR2
, file2_name_in IN VARCHAR2
, dir2_name_in IN VARCHAR2 := NULL
)
RETURN BOOLEAN
IS
l_file1id UTL_FILE.file_type;
l_file1line VARCHAR2 (32767);
l_file1eof BOOLEAN;
--
l_file2id UTL_FILE.file_type;
l_file2line VARCHAR2 (32767);
l_file2eof BOOLEAN;
--
l_identical BOOLEAN DEFAULT TRUE;
PROCEDURE initialize
IS
BEGIN
l_file1id := UTL_FILE.fopen (dir1_name_in, file1_name_in, 'R', 32767);
l_file2id :=
UTL_FILE.fopen (NVL (dir2_name_in, dir1_name_in)
, file2_name_in
, 'R'
, 32767
);
END initialize;
PROCEDURE cleanup
IS