|
|
Resolving “Subtle”Program Bugs
by Tom Mochal
Are you tired of your projects ending up severely “challenged” and missing their commitments for schedule, budget and scope? TenStep has the full solution of products and services to help your organization successfully execute projects. Contact us today at info@TenStep.com. We will work on the best package of projects and services to meet YOUR organization’s needs.
In many cases, when you encounter a bug, you can look at the general characteristics and the error message generated to make a quick determination of the cause. In other cases, there might be nothing obvious that would show the point of failure. These are called “subtle” bugs. They are not obvious to find, and they are generally caused by a combination of events that masks the original root cause. They can take a long time to resolve.
Obviously, you want to use any good automated debugging tools that you have. However, the tools should be used in combination with a good set of detective skills. Some good debugging techniques include:
- Always recreate the error first. This seems obvious, but it is the cause of a lot of frustration with inexperienced programmers. For the most part, you cannot solve what you cannot see. If a user tells you that he or she encountered an error on his or her screen, you can look forever trying to find a cause. However, is he or she giving you the exact sequence of events? Usually not. You need to carefully recreate the scenario to a point where the error occurs on a consistent basis. If you cannot recreate the error, a good approach is to ask for the user to notify you if the problem occurs again. If it does occur again, hopefully you will be in a position to capture enough information to recreate the problem.
- Display the interim values. If you do not have tools that allow you to interrogate the code flow, you should display interim variable values as the program executes. Again, this starts to build a visual picture of what is going on and can generally lead to understanding the program flaw.
- See if anything has changed. When stable programs suddenly go bad, it is usually caused by events out of the ordinary. One of the first places to look is to see if the program or any interface programs have changed recently. If they have, then restore the old programs in the test environment and see if you can recreate the error. If you can, then you can probably eliminate the possibility that the erroneous code was recently introduced. If you cannot recreate the error, then it points out that the error was probably introduced in the recent changes.
- Narrow down the code. Some programs do a lot of processing. This can make it difficult to see what is going on. Try commenting out large sections of the code and then trying to recreate the error. This is a process of elimination approach. If you comment out a large block of code and the program runs fine, then the offending code is in the commented block. Next you uncomment out subsections, or a small number of lines of code. Each time you run the program, see if the bug occurs. When the bug hits again, you have found the code causing the error.
- Narrow down the data. This is similar to the prior example, except you start to narrow down the data instead of the code. If you have a general idea of the set of inputs that is causing the problem, focus on the logic paths that the sequence of data would follow. If you are not sure what data is causing the problems, you can use a process of elimination. For instance, if you have large files or tables that are used by the program, start by cutting the data in half. If the error is still there, then continue to selectively reduce the input. If the error no longer occurs, then work with the data you first eliminated, since the error combination is probably in that set of data. Theoretically, by cutting the files in half, you should be able to isolate the error to a single row (or record), or combination of rows, that is causing the problem.
- Look for patterns. In many cases, errors do not occur one time, but over and over. When this happens, it is important to understand what the pattern is. For instance, you may find that an error occurs when processing every other transaction or an error may occur for certain people with specific characteristics, but not others. If you can detect a pattern, you typically have a head start to solving the problem.
Some bugs are obvious and easy to find and fix. Other bugs require a more structured and logical approach. It is amazing how much time is spent on searching for errors randomly or scouring code to try to find erroneous logic. Much of this chaotic time can be eliminated by using the structured techniques explained above to identify the subtle bugs.
Each month, Tom Mochal presents techniques and processes for IT development projects. Tom is the recent winner of the 2005 PMI Distinguished Contribution Award. His company, TenStep, Inc. develops business methodologies, including a project management process called TenStep (www.TenStep.com) and a project lifecycle process called LifecycleStep (www.LifecycleStep.com). |