The Monitor — Stephen Boak, Senior Product Designer at Datadog
For the fifth edition of The Monitor we spoke to Stephen Boak, a Senior Product Designer at Datadog. With turn-key integrations, Datadog seamlessly aggregates metrics and events across the full devops stack, giving you full visibility into your application. Sentry also just so happens to integrate with Datadog.
Over seven years and four companies, I’ve seen a lot of change in the monitoring industry. What we’re looking and and how we’re looking at it is completely different from where it was when I started. The infrastructure we monitor moved from physical data centers to cloud-computing. Physical hosts became virtual machines which became containers.
Over that same time, not a lot has changed with the way users actually interact with monitoring products. But with the introduction of the AI all of that is going to change.
When it comes down to it, monitoring is ultimately about user experience. When we’re watching a backend system and when we’re looking at performance metrics to understand how the system is behaving, we’re really just trying to understand the interaction between our system and our users.
At Datadog, we make sure we never lose sight of that. Our objective isn’t to track CPU usage across a thousand different servers. The objective is to let our customers know whether their users are happy or not. Are their pages loading quickly? Are they able to do all of the things they want to do? And monitoring CPU usage is one way to try and answer those questions.
Our customers use Datadog to figure out how they can make their product work better for their users. And, as a Senior Product Designer, I’m working with engineering teams across the company, but I’m representing how the user is interacting with our product and what they’re getting from the experience.
And with tools like Sentry we’re able to monitor the user experience of our products. When we see latency metrics and page speed and when we know where and how errors are happening, we have a much greater understanding of the experience users are having and how we can improve it.
The basic rules of how we construct our monitoring has always been the same. A user tells the monitoring product to look at something specific, like the amount of disk space left on a machine. Then the user instructs it to take some step when it hits a certain threshold, like to send an alert when the disk is 90% full.
We create this extensive set of instructions for monitoring our systems so we can understand how it is working based on its external output. And that has just been the experience of monitoring. But our systems are getting more and more complex, with microsystems and integrations and dependencies. As we introduce more parts into the system, we accumulate more and more things to monitor. And these increasingly complex systems and the “if it moves, monitor it” mentality has made the job of understanding how a system is working much more difficult. It feels like this inevitable march towards watching more and more metrics, with more and more thresholds.
All of this has meant a lot of human effort going into watching. It has meant more people watching increasingly complex dashboards trying to diagnosis the system in real-time and it’s meant more and more people getting woken up in the middle of the night by alerts. And it’s gotten very expensive.
This is just a small part of what Stephen had to say. Go here to watch our video interview and read the rest of his post.