Share on Twitter
Share on Facebook
Share on HackerNews
Share on LinkedIn

Solve code-level bottlenecks with Profiling for Python

Profiling is an important tool in every developer’s toolkit because it provides a granular view into the execution of your program from your production environment. This is an important concept, as performance bottlenecks can often be very hard or even impossible to reproduce locally due to external constraints or loads only seen in a production environment.

Python is one of the most popular programming languages available, and it is one of the core technologies we use at Sentry. Today, we are excited to announce that the alpha release of Python profiling is available for WSGI applications in sentry-sdk==1.11.0 to provide code-level insights into your application’s performance in production. Profiling is free to use while in alpha — we’ll post updates on our progress on social and Discord when we get closer to GA.

To get started with Python profiling, simply set the profiles_sample_rate on the SDK.

sentry_sdk.init(
    dsn="https://examplePublicKey@o0.ingest.sentry.io/0",
    traces_sample_rate=1.0,
    _experiments={
        "profiles_sample_rate": 1.0,
    },
)

Under the hood, the SDK runs a separate thread capturing samples at a sampling frequency of ~100Hz. At the end of a profile, the samples are sent to Sentry where the profile is visualized as a flame graph. Flame graphs allow the user to quickly spot hot code paths and optimize resource consumption (e.g. CPU) at endpoints or in slow traces.

Improving Sentry with Sentry Profiling

We have been running an internal version of the Python profiler on Sentry for some time now, and we’ve been able to optimize Sentry using it.

Working with Date Times

Everyone loves a good time series, and we do too. That’s why we have time series graphs in various parts of Sentry. But we’ve noticed that sometimes the time series take a long time to generate.

While investigating this delay, we were able to identify that the most likely candidate was the discover.discover - timeseries.transform_results using our suspect spans feature. And with further investigation, we realized that while this function is usually pretty fast, it can get really slow — in this case, hundreds of milliseconds slow. This meant it was slower than the query to get the data!

performance page for discover

The post-processing is actually slower than the query!

Looking at the code, it was just iterating through the results and doing simple transformations. It wasn’t immediately clear what was going on, but the profile revealed that almost half of the request was parsing dates inside of the zerofill function.

date time parser profiling

It was the date time parser inside of the zerofill!

This finding surprised us. Naturally, we did some simple benchmarking to verify the findings and compared a few ways of parsing date time strings.

>>> from dateutil import parser
>>> from datetime import datetime
>>> from timeit import timeit
>>>
>>> ts = "2022-11-09T16:15:00+00:00"
>>>
>>> number = 1_000_000
>>>
>>> print(timeit("parser.parse(ts)", number=number, globals=globals()))
85.08067758600001
>>>
>>> print(timeit("datetime.strptime(ts, '%Y-%m-%dT%H:%M:%S+00:00')", number=number, globals=globals()))
9.526757356999994
>>>
>>> print(timeit("datetime.fromisoformat(ts)", number=number, globals=globals()))
0.1656539809999913

The difference shocked us. Changing from dateutil.parser.parse to datetime.datetime.fromisoformat made it potentially hundreds of times faster!

Now, datetime.datetime.fromisoformat isn’t a complete ISO 8601 parser source (not until Python 3.11 anyways), but we were satisfied with the inverse of datetime.datetime.isoformat. Making this simple one-line change, we saw that the worst case significantly improved.

datetime performance improvement

Django: To prefetch or not to prefetch

Prefetching is a common technique employed in Django to prevent excessive database queries caused by accessing related objects. Instead of making one query per object when it’s accessed, prefetching allows us to make a single query to fetch all the related objects at once.

We employ this very technique in various places in Sentry. However, on the users endpoint, we noticed that it was still very slow despite us doing the necessary prefetching on the various related objects.

Looking at some example transactions, we saw something interesting. There were a few periods of time where we have no instrumentation. Most interestingly, they were scattered in between some db spans.

no instrumentation in performance page

Digging into the code, we realized that these db spans corresponded to the following query. The first query is for OrganizationMember and the 3 queries that followed were for the 3 lookups specified within the prefetch_related.

OrganizationMember.objects.filter(
    user__is_active=True,
    organization=organization,
    id__in=OrganizationMemberTeam.objects.filter(
        team_id__in=ProjectTeam.objects.filter(project_id__in=projects)
        .values_list("team_id", flat=True)
        .distinct(),
    ).values_list("organizationmember_id", flat=True),
)
.select_related("user")
.prefetch_related(
    "teams", "teams__projectteam_set", "teams__projectteam_set__project"
)
.order_by("user__email")

What was happening between each of these queries? Looking into the profile, we were drawn toward prefetch_related_objects, an internal Django call, which is taking up a large portion of the request.

internal django flame chart

It turns out that we were querying a many to many relationship between the OrganizationMember and Team, then from Team to Project. While the queries triggered by prefetch_related were fast, Django was performing the join in Python. This meant that even though we had only fetched a handful of teams and projects, Django was iterating over every OrganizationMember, every Team, and every Project — and then setting the related object as an attribute.

With this knowledge, we set out to rewrite the queries to avoid performing this join in Python. And we were able to accomplish this by removing the prefetch_related, manually fetching the related objects, and carefully constructing the response. This decreased the p95 by nearly 50%!

prefetch_related performance improvement

Profiling complements Performance

Profiling is the perfect complement to Sentry Performance. Where Performance gives us a high-level view of the problem, profiling lets us drill down to the exact line number to fix it with ease.

To successfully profile our code, we first ensured that all our services were instrumented with Sentry’s Performance product. Because Sentry traces your latency issues across services, you’ll be able to easily correlate profiles with transactions and identify the root cause of your slow requests. Going into our Performance dashboard also helped us establish a baseline for how our application was performing — since we could clearly track our p75 durations, throughput, etc., and see how they compared prior to us implementing profiling versus after.

Get started with profiling today. Pro-tip: make sure you have Performance instrumented first (it takes just 5 lines of code).

And, as we expand Profiling to more SDK communities and users, we want to hear your feedback. Reach out in Discord or even email us directly at profiling@sentry.io to share your experience.

Your code is broken. Let's Fix it.
Get Started

More from the Sentry blog

ChangelogCodecovDashboardsDiscoverDogfooding ChroniclesEcosystemError MonitoringEventsGuest PostsMobileOpen SourcePerformance MonitoringRelease HealthResourceSDK UpdatesSentry
© 2024 • Sentry is a registered Trademark
of Functional Software, Inc.