Everything is Broken and I Don't Know Why

Matt Robenolt /

There are few things in life that we enjoy more than good, healthy, broken code. It’s inevitable that things are going to break, it’s inevitable that we’re going to need to debug those things, and it’s inevitable that we’ll need do whatever is necessary to fix them. No one ever ships 100% perfect code.

In this new series we’re delving into broken code and all the many ways we can do a better job fixing it. Why do we spend five hours fixing a bug, when we can spend five minutes fixing it instead? Today we’re kicking things off with browser JavaScript.

This happens all the time: we ship production code, all our tests pass, things seem fine, we celebrate… and then users start complaining. Sometimes they complain right away. Sometimes days later. We usually have no idea what happened to make them complain in the first place. No developer has ever intended their code to unexpectedly break whatever it broke.

So we end up scrambling to do post-mortem debugging. An exception has happened, we don’t have the info as to why, but we need to figure it all out real quick by debugging after the fact.

If an exception happens in production and nobody sees the Chrome debug console, did it really happen?

Ancient Proverb From long ago

But we want to catch these issues before users have the chance to complain to us. Ideally, this would be solved easily with tests. Why not just cover every scenario with a test? Then life would be perfect and fine and great.

Because here in reality, humans are pretty bad at writing tests. Not just because we’re all kinda lazy and maybe a little dumb, but also because:

  • We can’t anticipate every single way users are going to interact with our product. They might do something really, really, really stupid (or something really, really, really smart) that we didn’t think about. Or they might do something perfectly normal that we also didn’t think about.

  • We often don’t even write tests.

  • QA processes are faulty. For pretty much the same reasons as writing tests. No QA team can possibly test every single thing your users are going to do. Your users are your best QA team. They’re gonna figure out what’s broken by touching every single thing you didn’t think was possible.

  • Even if the above was perfect, bugs ARE going to get into production. There’s nothing you or me or anyone else is going to do to about that.

If your product has a customer base and one of them emails you to complain about something being broken, they probably send you a message like: “It doesn’t work for me. Please fix! I’m losing revenue.” People are always losing revenue, even if your app just hosts cool pictures of dogs.

It does not work for me.

Customer An angry one

If you’re fortunate enough to hear from a tech savvy user, then you can maybe somehow convince them to open up their browser console, reproduce the bug, and send you a screenshot.

You probably won’t be so fortunate. Which may lead you to throw up your hands. “It works for me and I’m not going to fix this cause I don’t know how to debug your problem. And I’m not getting enough info to fix.”

Obviously we don’t want to rely on users sending angry messages and faulty screenshots to tell us what happened. We want to see what happened in real time. We want to see the stack trace and the problem in context. How can we be more proactive about this?

It works for me. #wontfix

A developer Maybe you

Well, there’s always the most obvious way to go about doing this, my absolute old time favorite: a window.onerror.

If you’re not familiar, it’s essentially a global callback that throws a net over your entire application. I call it window.onNOPE. It’s literally the worst way to capture an exception other than mind melding with your code so that it makes you faint every time it breaks.

The callback signature used to look something like this:

function(message, url, lineNumber)

It took three arguments, the first one being a message, the url of the file that caused the exception, and the lineNumber

The problem here is that none of these are an actual Error object. They’re just a message, url, and a number. In practice these three arguments end up looking like this:

message: "TypeError: Cannot read property 'foo' of undefined" 
url: "http://example.com/foo.js"
lineNumber: 10

At a glance this looks really useful. We can drill into our code, look at line 10, and figure out what’s going on. The issue is that your code is probably minified. Line 1 is your entire application. You have a million characters on this one really (really) really long line.

Also, your JS Is likely hosted on another domain. A CDN or a subdomain or a cookie-less domain, something like that. If a script is on another domain, it’s subject to CORS (Cross Origin Resource Sharing). And if the page it’s on doesn’t have the proper CORS headers — and it probably doesn’t — then all window.onerror is gonna tell you is “Script Error”. So you have literally no idea what happened, just that something happened. It’s like if someone called the fire department to alert them there was a fire somewhere in a thousand mile radius and that they really need to get on that.

Over time browsers have expanded on window.onerror to include more information:

function(message, url, lineNumber, columnNumber, error)

Now we have our Error object. This is nice, cause we can (try to) get a stack trace from this. For those unaware, a stack trace is just a record of the function calls up to this point within the current call stack. A series of function calls: you call a, a calls b, b calls c, and on down the call stack.

When you’re debugging something, if you don’t have a stack trace it’s going to be very difficult to fix. You can maybe see that the exception happened at this line, but you don’t know how you actually got there: you don’t know which function called that function; you don’t know the order of events that preceded it.

Let’s explore by opening up our favorite executable to throw a new error:

$ node 
> throw new Error('1ol') 

I’m sure you’ve seen something like this:

Error: lol
    at repl:1:7
    at REPLServer.self.eval (repl.js.110:21) 
    at repl.js:249:20
    at REPLServer.self.eval (repl.js:122:7)
    at Interface.<anonymous> (repl.js:239:12) 
    at Interface.EventEmitter.emit (events.js:95:17)
    at Interface._onLine (readline.js:202:10) 
    at Interface._line (readline.js:531:8) 
    at Interface._ttyWrite (readline.js:760:14) 
    at ReadStream.onkeypress (readline.js:99:10) 

This is what comes back when you arbitrarily throw and do not catch an exception. What do each of these things mean?

  • At the beginning you have the name. In this case, we threw a generic error. This would also be a TypeError or ReferenceError or something else like that.

  • That’s followed by the message. That would be that “something is undefined” message or just “lol” as in our example.

  • Then we have the actual stack trace, which we can break down into frames. Each line gives us very useful information. Take Line 7:

    It gives us the caller, aka the function of the frame (Interface.EventEmitter.emit). The source the function was called in (events.js). The line number (95). And the column number (17).

We also now have this property off the error prototype “.stack”. A string. Probably the most useless thing you can get back from this.

If you want to start breaking this apart and extract the pieces out of it, you need to use a regular expression, something like:

/at (?:(.+)\s+)?\(?(?:(.+?):(\d+)|([^)]+))\)?/

What’s this look like in practice? Great question. Which is why we’ll cover it in our next edition of Everything is Broken and I Don’t Know Why.