Apr
24
Written by:
Jeff Smith
4/24/2008
Despite the best intentions of the DBA or the person who designed the data model, the collection of primary keys, unique indexes, constraints, and triggers are not a foolproof method of keeping duplicate records creeping into your tables and views. Any business rule in the database can be disabled. An even more likely scenario is the business rule changing after the fact. So what’s a person to do when they need to clean up their records?
Cue Toad’s ‘Data Duplicates’ feature.

Simply point to a table, choose the columns you want to use as an identifier, then Toad will find each group of rows with duplicate values. Toad then allows you to delete or modify the records in context on the fly! Let’s take a quick look to see just how easy this can be.
Step 1: Select your table and identifying fields

We’ll work with the HR.EMPLOYEES table and pretend that some silly person decided that a manager could only hire 1 employee per day, and now we need to go fire some people. Silly? Yes, but I bet some of you have seen similar ‘silly’ rules be implemented in the database without much forethought and as a consequence you’ve been stuck doing the cleanup work.
You see on this screen you have the option to treat all columns as identifiers or selected ones. Choose the option that fits your needs.
Once selected, Toad can show you the fields that have duplicates, the number of duplicates, and retrieve the selected rows.
Step 2: Analyze the Results

After selecting the fields on the first tab, simply click on the ‘Duplicate Data (Summary)’ tab. Now Toad shows us the HIRE_DATE || MANAGER_ID matching pairs and a count of each. In this case, our table has 3 duplicate records where these 2 column fields are not unique. So we have 2 people that either has to have to be fired or have their records changed to protect the innocent.
You may wonder, how is Toad finding these records? Let’s look at a simpler example of where we choose the HIRE_DATE as the new unique field. Again, I’m not using real world examples that make sense here, but it does give you a good idea of how the screen works.

Toad found the 7 groups of HIRE_DATES that have duplicate values attached. Or put in another way, Toad found 11 rows that would prevent a UNIQUE constraint being applied to the HIRE_DATE column. Here is the SQL that Toad generated to find these records for you:
Select HIRE_DATE , count(*) as Occurrences
from HR.EMPLOYEES
HAVING count(*) > 1
group by HIRE_DATE;
As you can see, this is a very simple operation that you could do manually. What Toad does here for you is to easily switch back and forth from different tables and different groups of identifying fields to see the data in a very plain and straightforward way. The real value I think comes in Step 3 though.
Step 3: Remove the duplicates

Proceeding to the ‘Duplicate Data ‘Editable’ tab, we can now update or delete each of these rows to ‘trim’ the table down to the state we desire. I could on the fly change the HIRE_DATE or MANAGER_ID for each of the offending records. Instead of figuring out ANOTHER identifying field to use as a limiter in a WHERE clause for a manual DELETE or UPDATE, Toad does the heavy lifting for me by using the ROWID for each of the edits. I can now concentrate on the ‘3rd’ employee and change her MANAGER_ID on the fly and let Toad worry about how to figure out which record to issue the UPDATE for.
UPDATE HR.EMPLOYEES
SET
MANAGER_ID = :MANAGER_ID
WHERE
ROWID = :Old_ROWID
MANAGER_ID = 102
Old_ROWID = 'AAAMhEAAFAAAABYAAH'
What if you only want to see the ‘first’ or ‘last’ duplicate row for each identifying field? If you’ll notice on the toolbar, the first button labeled ‘View/Edit Query’ allows you to specify what rows you want to look at.

We can change the query that Toad issues to display the duplicate records by simply changing the ‘<>’ to a ‘>’ or ‘<’.
If you choose to do so, the button will be highlighted to let you know you have filtered the result set to not display all of the offending records. The concept of ‘First’ and ‘Last’ and how it applies to ROWIDs in a table is generally appropriate, but be aware an older record’s ROWID can change and cause it to appear as a ‘Newer’ record, so beware!

Conclusions
- Despite all precautions and best intentions, duplicate records WILL find their way into your database tables.
- Finding these can be fairly simple using some straightforward SQL statements.
- Toad will write these SQL statements for you and present the findings in a very easy to understand manner.
- Once found, Toad also will allow you to manually ‘fix’ each record without worrying about how to distinguish them if you were to write your own INSERT/UPDATE/DELETE statements.