How Droplr Uses Sentry to Debug Serverless Applications

Antoni Orfin /

Antoni Orfin, Chief Architect at Droplr, has years of hands-on experience building scalable web applications that serve traffic for millions of users across the globe. His journey started with bare-metal infrastructures. Then, he dove deep into cloud. Today, he’s responsible for making things run smoothly on state-of-the-art serverless architecture at Droplr.

Last year was quite demanding for the Droplr engineering team: we made a huge migration to serverless architecture. Serverless is still a new approach, so it required solving emerging issues like monitoring and debugging in a fresh way. How can we ensure our migrated applications are working as expected 100% of the time? Ultimately, end users don’t care about the architecture, just a seamless experience.

Serverless: What & Why

Droplr has a small but experienced team of engineers. We prefer to focus on building product features and avoid technical things that could introduce impediments for the development process. From day one, all of our infrastructure ran on AWS’s cloud, utilizing EC2 instances, and Docker containers orchestrated by ECS and load balanced by ALB.

Here comes serverless to the rescue! It is not a single technology; it is more like a way of thinking. It basically means going all-in on services over servers.

Antoni Orfin

At first, it may seem that we already outsourced most of our SysOps tasks to AWS. But EC2 instances still need to be managed by someone. You have to upgrade the OS, apply security patches, monitor microservice utilization, and work to make it highly available. That means hours spent working on infrastructure instead of the actual development of Droplr features. What a waste!

Here comes serverless to the rescue! It isn’t a single technology; it’s more like a way of thinking. It basically means going all-in on services over servers.

One of those services, announced in 2014, is AWS Lambda. It’s a technology that allows us to run code in a response to a specific event, such as an HTTP request or SNS notification. The beauty of the solution is that we don’t need to maintain any underlying infrastructure. We just upload our code and select when it should be invoked. We’re billed for the actual duration of functions execution (every 100ms), so there’s no more waste on underutilized services.

Debugging Serverless Applications

When migrating to new technology, I always want to know how my application performs in the new environment. At a high level, I can graph metrics such as function duration and rate of errors. Then I can set up alerting based on thresholds. This all works quite smoothly with existing AWS services such as CloudWatch.

But how do we spot serverless errors in such huge log streams that span hundreds of functions invoked thousands of times every second?

Antoni Orfin

However, when diving into the application layer, things start to get complicated. By default, all AWS Lambda functions log to CloudWatch Logs. So when we do console.log in our Node.js application, it’ll be visible there. But how do we spot errors in such huge log streams that span hundreds of functions invoked thousands of times every second?

Why We Chose Sentry

Setting up alerts based on logs in AWS is not an easy task. At Droplr, we love finding smart ways to actually speed up our development instead of overcomplicating it. I knew that there had to be an easier solution.

When reviewing serverless monitoring and error tracking options, I focused on a few key requirements:

  1. Scale to debug AWS Lambda across more than 100 different functions, composed into tens of microservices and various environments
  2. Seamless integration with existing Node.js functions, without code changes
  3. Ability to extend debug information with additional context
  4. Beautiful UI and accessible UX that developers would love and use daily

After testing a wide variety of AWS Lambda debugging tools, we found Sentry to be a perfect match for our serverless stack. First, where some error tracking platforms bill by project — driving up prices for microservices even if they aren’t sending any events — Sentry’s pricing especially on-demand billing, helped us scale without unnecessary cost burden. Sentry bills the same way Lambda functions are billed, which meant no more underutilization and waste.

Our Lambda functions are composed of higher-level microservices based on business domains. We also have different environments like production, staging, and dynamic creation from a Git branch (each developer can deploy and test the whole microservice from a specific branch). Fortunately, Sentry fits that model perfectly. We have different projects with names that match our microservices. Each project is automatically tagged by environment as a new event with an ENV parameter is processed.

[Editor’s note: Sentry’s new Environments feature that was launched with Sentry 9 enhances this capability even further.]

We just loved the interface and how much easier it is to debug Lambda errors in Sentry. Even working with a huge number of different Lambda functions did not cause any issues.

Antoni Orfin

We were also able to integrate Sentry’s SDK into Lambda execution code so that we save metadata information such as function name, version, and memory limit. That is the information available through the Lambda context object, which allows us to analyze different functions in a single Sentry project.

We also found Sentry to be the leader in UX. We just loved the interface and how much easier it is to debug errors in Sentry. Even working with a huge number of different Lambda functions didn’t cause any issues. Multiple error events are nicely aggregated in Sentry’s UI into a single issue, and Sentry makes it really easy to filter events by metadata tags. That way, we can easily find errors by specific function name or even by the memory limit setting (to find out if lowering memory on a Lambda function caused issues).

Things to Remember

The execution model of AWS Lambda has some particularities that every developer needs to be aware of, especially when it comes to monitoring serverless architectures. The first is that you want to set the following tiny property in the Lambda context. Otherwise, sending a Sentry event will block functions and lead to timeouts as the request may hang in the Node.js event loop:

context.callbackWaitsForEmptyEventLoop = false;

However, with that setup, the traditional Node.js model of handling errors in Sentry won’t work. Invoking a callback function will immediately stop function execution, and asynchronous sending of Sentry events won’t be invoked:

Raven.captureException(error);
callback(error);

The correct way to invoke a Lambda callback after sending an event to Sentry is:

Raven.captureException(error, (sendErr, eventId) => {
  callback(error);
});

Conclusion

After a year of actively using Sentry, my team at Droplr considers it an essential tool for serverless monitoring at the application error level. We have integrated Sentry into all of our projects, and our engineers are using it on a daily basis to debug microservices as part of their development workflow. Using Sentry, we have a clear overview of our big Lambda ecosystem, and we’re not disrupted from our larger goals to investigate and triage issues. That means no more pesky, time-consuming log analysis. We get an email, we iterate on the code, and we roll forward. We love Sentry for AWS Lambda monitoring!