Performance Monitoring for Every Developer: Web Vitals & Function Regression Issues
Extracting relevant insights from your performance monitoring tool can be frustrating. You often get back more data than you need, making it difficult to connect that data back to the code you wrote. Sentry’s Performance monitoring product lets you cut through the noise by detecting real problems, then quickly takes you to the exact line of code responsible. The outcome: Less noise, more actionable results.
Web Vitals unify the measurement of page quality into a handful of useful metrics like loading performance, interactivity, and visual stability. We used these metrics to develop the Sentry Performance Score, which is a normalized score out of 100 calculated using the weighted averages of Web Vital metrics.
The Sentry Performance Score is similar to Google’s Lighthouse performance score, with one key distinction: Sentry collects data from real user experiences, while Lighthouse collects data from a controlled lab environment. We modeled the score to be as close to Lighthouse as possible while excluding components that were only relevant in a lab environment.
To improve your overall performance score, you should start by identifying individual key pages that need performance improvements. To simplify this and help you cut to the chase, we rank pages by Opportunity, which indicates the impact of a single page on the overall performance score.
In this example, the highest opportunity page in Sentry’s own web app is our Issue Details page. Since this is the most commonly accessed page in our product, improving its performance would significantly improve the overall experience of using Sentry.
After identifying a problematic page, the next step is to find example events where users had a subpar experience. Below, you’ll see events that represent real users loading our Issue Details page, with many experiencing poor or mediocre performance:
In the above screenshot, you’ll see a user that experienced a performance score of 9 out of 100 (Poor), primarily driven by a 10+ second Largest Contentful Paint (LCP). Ouch. These worst-case events highlight performance problems not evident during local development or under ideal conditions (e.g. when users have a fast network connection, a high-spec device, etc.).
You’ll notice some of these events have an associated ▶️ Replay. When available, these let you see a video-like reproduction of a user’s real experience with that page. When optimizing your app’s performance, these replays can help you understand where users have a subpar experience–for example, when they struggle with 10-second load times.
To find out what caused the slow LCP, you can click the event’s Transaction button, which provides a detailed breakdown of the operations that occurred during page load. We call these operations spans.
The most relevant spans are the ones that occur before the red LCP marker, as these spans are potentially LCP-blocking. Spans that occur after the LCP marker still impact overall page performance, but do not impact the initial page load.
The first span that looks like a clear performance bottleneck is the
The next clear opportunity is this
EChartsReactCore.prototype.componentDidMount function takes 558 ms to execute, which is over half the duration of the long task span. This function is linked to a React component that renders a chart, provided by the open-source ECharts visualization library. This looks exactly like where we should focus our attention if we hope to bring our Issues Details pageload times down.
Recently, we launched the ability to view the slowest and most regressed functions across your application. Now, we can help you debug function-level regressions with a new type of Performance Issue. Function Regression Issues notify you when a function in your application regresses, but they do more than just detect the regression— they use profiling data to give you essential context about what changed so you can solve the problem.
Function Regression Issues can be detected on any platform that supports Sentry Profiling. Below, we’ll walk through a backend example in a Python project.
The screenshot above is a real Function Regression Issue that identified a slowdown in Sentry’s live-running server code. It triggered because the duration of a function that checks customer rate limits stored in Redis regressed by nearly 50%.
The top chart shows how function duration changed over time, while the bottom chart shows the number of invocations (throughput). You’ll notice that throughput also appears to increase during the slowdown period. This suggests that increased load might be one of the causes of this regression.
In the screenshot above, you’ll see that the same issue gives us other essential information like which API endpoints were impacted by the regression and how much they regressed. This data reveals that the rate-limiting function was widely used and called by many of our endpoints, resulting in a significant regression in overall backend performance.
The Function Regression Issue makes it easy to see the example profiles captured before and after the regression — they’re displayed right under the “Most Affected” endpoints. Comparing these profiles gives the most crucial context, revealing (at a code level) the change in runtime behavior that caused the regression.
Here, we looked at an example profile event that was captured before the regression occurred. We noticed that our regressed function calls into two functions within the third-party
We compared this to a profile collected after the regression and noticed that one of those two functions,
ConnectionPool.get_connection, was taking significantly longer than before. Each function frame in a profile offers source context for where the function was defined and the executed line number. In this case, opening this source location in the
redis module yielded the following line:
This line attempts to acquire a lock — and the significant wall-time duration increase when executing this line suggests that multiple processes or threads are trying to acquire the lock simultaneously. This lock contention is the code-level reason for the regression.
This lock contention issue also aligns with what we saw earlier in the throughput graph – increased throughput makes contention more likely. Through additional investigation, we discovered that the increased throughput for this function corresponded to an increase in the number of Redis connections starting around the time of the regression; our next step is to isolate the source of the additional connections.
Using this example, we’ve illustrated how Function Regression Issues can help you link performance regressions directly to the code causing the regression with profiling data. While this specific example focused on a backend use case, this capability works on any platform that supports Sentry Profiling.
Function Regression issues are available today to Early Adopters and will become generally available over the next 2 weeks.
A high-performance bar is critical for building differentiated products that people want to use. With Web Vitals and Function Regression Issues, we’ve provided more ways for all developers to solve performance problems by connecting them to code.
Tune in for more exciting product announcements with Sentry Launch Week. To set up Sentry Performance today, check out this guide. You can also drop us a line on Twitter or Discord, or share your Web Vitals feedback with us on GitHub.