Ship Code Smarter with Monitoring & Feedback
Ahoy there. Continuous shipping: a concept many companies talk about but never get around to implementing. The first post of this three-part series discussed the importance of continuous shipping, while Part 2 steered us into the depths of the process itself. We’re all hands on deck for part three, where we’ll wrap up the second half of the continuous shipping process.
Before we jump into steps three and four of the continuous shipping process, let’s quickly review what we already know (bear with us here). Continuous shipping and its reliance on quick iteration are essential to modernizing development, as they help teams minimize risk and increase developer productivity. Improved customer sentiment is also a result enjoyed by all.
It’s worth noting that the continuous shipping process begins with integration — the frequent merging of changes assisted by automation. Equally important is the deployment phase, where — as you might imagine by the subject matter — code is shipped as soon as it’s ready.
Now that we have our change control process in place and are practicing self-serve deployment, how can we know when things (inevitably) go wrong? Well, the goal is that we know from day one. Ideally, this step won’t derail teams, as they would start with monitoring already built-in.
Let’s begin with a few ground rules:
- Monitoring should be constructed from the ground up.
- Instrumentation should be built into your application and not bolted on top.
- Observability should also be as real-time as possible(as in: seen within seconds).
If we know a problem within seconds, we can automate it. We can deploy a change to production, and a tool (like, say, Sentry) can tell us if there’s an increase in errors. From there, a decision can be made about whether that change is safe or needs to be rolled back.
Graphs and logs, while useful, are simply not enough — they don’t point to the issue. Sure, graphs and logs may illustrate symptoms, but they don’t distinguish or really even hint at an immediate cause. If you have a complex application, at the very least you have a website or a mobile app and a server component. Now imagine that a user loads your app and clicks a button. That call hits an API service, which errors. You see an error in the app, and the API service sees an error, but is that enough context on its own? It sometimes is, but it would help to know the details and context of what happened in the app.
This type of high-level tracing is necessary. In complex applications, more than 100 different things could be communicating with each other to cause this error. The cause might be tied to clicking the button in the app, or it could be deep within the app or even the server side. The important point here is to uncover the actual cause and fix it. To get to the bottom of the problem, you’ll need to know when and where the bug started as well as when the code change was introduced.
Root cause analysis and high-level tracing are part of a more holistic train of thought, one that includes both application and systems monitoring. If you’ve ever looked at New Relic, a lot of its tool is application-level monitoring that exposes a problem with the code but not what it looks like at the system level. However, the system level is where you gain a deep insight because these systems often live inside the code and need to see the full source code. For example, Sentry knows the exact user that is acting on a request. Sentry also can often tell what variables are assigned and give more context to kickstart meaningful investigation into the issue.
Tools remain vital to the monitoring phase. It’s best to narrow down an issue and find a service that addresses that issue specifically, like Sentry does for errors. If you’d like to work with an open-source tool to look at metrics and logs, check out Prometheus, the Elastic stack, Zipkin, OpenTSDB, OpenTracking, or Sensu. If cloud services are more your thing, look into Datadog, Scout, New Relic, PaperTrail, or StackDriver.
Of course, 500,000 users think that Sentry is both a great open-source solution and a great cloud service.
Customer support, while not an incredibly innovative space, is a useful element in the continuous shipping process. The key here is to be both active, with automation and alerts, and reactive, with feedback requests. Nine out of 10 customers won’t complain, so waiting for your customers to make you aware of errors isn’t always a reliable strategy. Ideally, you’ll know about issues before your users tell you. Errors are the first sign of a customer-related issue, which is one reason why Sentry places a considerable emphasis on error monitoring.
Context is critical in this phase, as it expedites the resolution and reduces friction experienced both internally and externally. There is now enough technology to collect data without adding steps for customers. Depending on your project, you can likely minimize your investment in engineering resources here. The goal of this phase is optimized resolution cycles. A lot of the problems aren’t difficult, but it’s about iterating quickly and efficiently.
Tools in this phase can also vary, depending on the types of issues encountered and appropriate feedback. Social media channels like Twitter are valid options. Ticketing systems like Zendesk may also prove valuable. Of course, there are tools focused on other issues that also include some feedback component, such as Fullstory, Intercom, and Sentry.
Transitioning from a long-form release cycle to the shortened cycle detailed in this series may challenge teams. But the results — especially quick, efficient iteration and improved sentiment — are worth the effort to modernize.
As you may have noticed, Sentry is for more than just tracking errors. By playing a key role in the continuous shipping process, Sentry helps teams improve productivity and focus on what they do best. We also play nice with the tools listed in the integration and deployment sections of Part 2 of this series. Want specifics on how Sentry can further improve your continuous shipping processes? Dive into this model for workflow optimization.