A Comedy of Errors — Divydeep Agarwal of Branch Messenger

In A Comedy of Errors, we talk to engineers about the weirdest, worst, and most interesting application and infrastructure issues they’ve encountered (and resolved) over the years. Our second engineer is Divydeep Agarwal, the CTO of Branch Messenger. Branch Messenger is an employee self service web and mobile platform that’s used by 600 million shift workers around the world.

This bug revealed itself right after we went live with our new web build. Naturally, we expected to see some bugs after our very first launch (editor’s note: always expect bugs after a launch), but what we didn’t expect was for a notable subset of our users to no longer be able to log in to their accounts. But that was exactly what happened. This obviously made us pretty anxious, and that anxiety was only amplified by having most of the bug reports come in from corporate enterprise environments.

The problem was a minor one at first glance: certain user groups were completely unable to close pop ups and move on, regardless of whether they clicked cancel or the close icon in their browser (even if they clicked a hundred times in a row really fast while yelling at their monitors). There are few issues more frustrating than being able to glimpse the page you need sitting right beyond a pop-up, but being unable to reach that page because the pop-up won’t go away. Especially if you can’t even move past the login screen.

Complicating matters, we couldn’t replicate the issue on our end, and impacted users’ browser consoles weren’t showing any errors, making it difficult diagnose the problem. All users were able to tell us was that they couldn’t log in but would really, really like to do so. Right now if possible!

Looking at the issue in Sentry, we were quickly able to work out that the error was related to Angular’s built-in date pipe. We were additionally able to see that everyone impacted by the error (or, at least, everyone who had encountered it at that point) was on IE (because of course they were). This didn’t make us totally confident that the problem was IE-only, but we were able to piggyback on that insight to research if there were any known compatibility issues with Angular’s date pipe component in IE. It will not surprise you to learn that there were many.

Branch's mobile product
A look at Branch's mobile product interface

Once we discovered this, we felt confident the issue was IE-only, which didn’t make it any less pressing since many of our large enterprise users are still stuck with Windows 7 and running Internet Explorer 11. Most of these users are store managers who are sitting in front of a desktop managing the store’s day-to-day operations, making this sort of hiccup a huge impediment to doing their jobs.

Fixing IE bugs usually takes anywhere from a few hours to a few days, depending on how much time we need to get through each step of bug discovery, narrowing down the problem, and finding a fix. Thankfully, the aggregation of information in Sentry, — like being able to see exactly which browser versions were impacted — led us to the knowledge that this particular pop-up problem was specific to the Angular date pipe in IE. With that info in hand, it took us about four hours to find a fix. Had we not had access to this info, the cause of this problem would have taken significantly longer to track down since we couldn’t replicate it on our side, and our users couldn’t provide us with any info other than that they couldn’t log in.

To fix the problem, we made a tweak to our component that allowed it to load in a slightly different, simpler way. This change made it possible to do whatever it was IE was incapable of doing (which is not a small list). We celebrated by fixing a bunch of other bugs that looked similar and, suddenly, all of our IE users could login again! A simple but opaque problem that had caused huge trouble was tackled with a simple fix. You never expect a minor error, do you?

That’s why using an error tracking system is critical to understanding how our application is actually behaving in production. It acts as an early-warning system to spot and fix bugs before they impact any functionality and impact customer experience. And, as we saw in this situation, Sentry also enables us to fix bugs that do slip through far faster and with significantly less frustration than it otherwise would.

Bug counts don’t really matter. Maintaining high quality and meeting users’ expectation does. For Branch, we define “winning” as giving our users an amazing and delightful experience and fixing issues before the user knows they even occurred.