Supporting Rapid Business Growth with Scalable Solutions
When you have more than 150 engineers shipping new features daily, with 8 automatic deployments running at the top of every hour and teams regularly pushing new code, tuning out the noise takes on a whole new meaning.
Take monday.com. With over 1.2 million monthly users and more than a million collaborative actions per hour; volumes of activity coupled with monitoring tools that lacked advanced grouping functionality created alert fatigue, false-negatives, and an overall lack of trust in their existing solution.
The more complex our system becomes, with more microservices, routes, and integrations, the more important it is to correlate between sessions. Even if each request worked well on its own, when things get to production, the integration of services has the potential for problems.
Read on to learn how the team’s been able to:
- Reduce the time it takes to resolve an issue from between 30-45 minutes to 10 minutes
- Reduce the number of false alerts by 50% and,
- Reduce client-side errors by more than 60%
Growing a business and scaling it might be two sides of the same coin, but one can’t be sustained without the other. For monday.com, supporting rapid business growth meant finding a scalable, custom solution to solve for increasingly distributed architecture; while optimizing error tracking and ultimately accelerating time to resolution.
This meant finding a tool that could jump between their services seamlessly, allowing them to continue growing as their architecture became more distributed and more services were added to each flow.
They also needed the ability to manage releases to see real-time performance, the engineers involved, and any relevant errors.
Breadcrumbs show a timeline of actions that led to an error, reducing the time required to resolve it; but monday.com took this a step further and added their own custom events, allowing for more granular investigations.
Metric alerts generate a notification as soon as new issues surface, essentially giving the team a real-time pulse on what’s going on. By adjusting the threshold on alerts, they’re able to filter them so only high-impact alerts trigger notifications, before being assigned to the relevant team.
Finally, monday.com designed and built out custom dashboards in Sentry where they’re able to aggregate and keep an eye on any uncaught errors and directly query issue data to get to the root of the problem.
From error monitoring to release health and ultimately resolution, monday.com personalized their Sentry experience to adapt to shifting business requirements, allowing them to reduce “noise,” engineer fatigue, and craft a streamlined process from start to finish.
Monitoring a production environment is always a challenge, with regular deployments, third party integrations, and the cloud resources we depend on - the number of issues that can fire from any one system increases the complexity of figuring out which service or team owns what. Because of this, it is important to stay one step ahead to provide the quality we demand for our users.
If you’d like to learn more about how the monday.com team manages scalability alongside business growth, check out our full conversation here.