Bugs. Errors. Exceptions. Problems. Issues. Whatever you call them in the moment, bugs are deeply associated with failure. Specifically, our own failure to write perfect code. These “failures” can lead to enormous amounts of confusion and frustration, but the resulting satisfaction of mending that code and righting a wrong is one of the best feelings in software development.
This emotional rollercoaster can make fixing a bug feel like an unforeseeable departure from “real work”. Here’s the thing, though: that your code will have bugs is very much foreseen. Bugs always happen.
That’s why it’s important to keep this process in perspective. Bugs are not just some random annoyance, but an integral part of the development lifecycle. You should approach bug fixing with the same rigor and introspection you bring to the rest of your software development! Design, implementation, testing, deployment — debugging is no less worthy of its own conversations, iteration and reflection.
Bugs are not just some random annoyance, but an integral part of the development lifecycle.
In that spirit, here are three questions you should always ask yourself or your colleagues in order to treat each bug as a valuable datapoint and an opportunity to improve your software and process.
What is the general pattern behind this bug? Where else does it exist?
After an episode of skimming through source-code, pouring over logs, and following stack traces, it’s only natural to view a problem through a literal-minded lens. On this level, your bug was “just off by one”, “just a missing character”, or “just calling the wrong standard library method”. Widen your focus, and consider the connections this mistake had to other areas of your codebase. No line of code is an orphan. Think about which code is logically up and down stream of the execution path — how did it behave in relation to the bug, and what will happen now that this behavior has changed?
Where are this bugs siblings? Where else does this pattern occur? All codebases contain self-similarity through abstraction and duplication. What parallel paths exists to the one where you found the error? Have these other paths made the same mistake?
You’re pointed to a reproducible scenario in which payment receipts are generated with the wrong dates. After some investigation, you find that the date being passed into the rendering function is
undefined, but a utility library used to format the date helpfully interprets this input as a call to
Date.now(). This means the documents always display the current date instead of the one from the actual data. Awesome.
After a short temple massage, you audit your codebase’s calls to this library method. A quick search reveals a set of 7 other callsites, and upon investigation, you find that one of those sites has an identical problem.
What was the impact?
What was the real user impact? Was there any fallout that might not be obvious? Is there followup to be made with users, team members, or other stakeholders? How costly was it in lost productivity to solve, and what was the impact to the users of the software?
This information can be useful in two directions:
Most bugs aren’t a big deal; they can be resolved easily and don’t impact critical business needs.
Some bugs are a really, really big deal.
Thinking holistically about the costs of a bug can provide an important piece of the equation when making informed decisions about what process is right for your organization.
In the case of the mysteriously un-aging timestamp, you were able to reproduce the problem easily and use your browsers’ developer tools to track it back to the incorrect property access. The patch was a single line diff applied to the two locations, and it was shipped to production with a regression test, same-day.
You get back to the person on the finance team who raised the issue, and explain why the dates were incorrect. After a conversation, you find that these documents weren’t previously used for anything important, but there’s a new forecast being generated which needs them to be correct, so they’ll re-run it tomorrow, and double-check that the dates make sense.
Most bugs are no big deal. They can be resolved easily and without impacting critical business needs. Some bugs are a really, really big deal.
What could we do to prevent bugs like this?
When the dust settles and a patch is in place, it’s tempting to immediately move on. The adrenaline is subsiding, and the next task is probably pulling at your attention. However, before you do, there’s one more question, and it’s the most important one to ask for the long term health and correctness of your codebase.
Think through how the bug was introduced, why your existing process didn’t prevent it, and what could be done differently to catch it in the future. Are there steps you could take to remove this class of bug from your project’s development cycle?
In regards to our date example, the value was
undefined because the wrong name was used when pulling the property off an object. Unfortunately, it was not something that was caught by your linter, but it would have been a run-time
KeyError in python, or prevented at the time of writing by a type system.
It slipped through manual testing because the current date looked correct enough to the implementer or anyone who happened to look. Perhaps better tools for mocking the current time would have helped make it more visible?
The code path ostensibly had 100% test coverage, but the assertions were only around the other values of the rendered results and the date was not checked.
One view of the problem:
The behavior of
Moment.js was not appropriate for our use case, and it’s worth subverting this shorthand in the name of removing a footgun.
Inside the custom date rendering component which calls the third party function, we can explicitly check for an undefined first argument and raise an exception. Change cases that were intentionally relying on this behavior to explicitly pass in
Date.now(), and verify their new behavior is correct.
All bugs are preventable, if you scope your thinking large or small enough
At the same time, not all bugs are necessarily worth preventing, if the cost of doing so is not worthwhile when weighed against the bug’s negative impact.
Explicitly identifying a possible solution and deciding that it’s not a worthwhile tradeoff for your organization is absolutely a healthy exercise and a productive part of this process. The conversation is still worthwhile in its own right! There will be a inclination to write off every bug as an isolated accident, but following through with this mental exercise will encourage you to be creative and look at your problems through different lenses. The cumulative of these considerations and data points over time will help you to make informed, thoughtful choices for the long term health of your codebase.
(with respects to http://multicians.org/thvv/threeq.html, Tom Van Vleck, July 1989)