We like to think Sentry is an interesting and, in many ways, unique company. We started as an open source project which operated as a small business in our spare time for many years. While we look pretty similar to every other tech company these days, we’ve always maintained our commitment to building a viable business, and doing it with open source.
What does Open Source actually mean? It’s a frequent debate, and it boils down to the different aspects that make up the software as well as the surrounding community.
Often the first thing you think of when you hear Open Source is “free”. In the software world, free software is often associated with being the inferior choice. That’s especially true given that historically, many businesses offer an inferior version of their product for free, with an unapproachable price tag for the “enterprise” version. There is no difference between the two at Sentry. Everything runs off of the same code, which is available on GitHub.
People – rather, business people – often ask why Sentry is open source. What they are actually asking about is why we give it away for free, what our conversion funnel is, and how it affects value. This question is something we struggle to reason with, because in our mind there is no other choice.
Take a look at the evolution of the software industry and the shift to open source becomes even more clear. Where Oracle was once king, people are now quickly adopting newer open source technologies like Cassandra and MongoDB. Microsoft, a giant of traditional proprietary software, has made big moves in open source with .NET Core, and Apple – not much different – with Swift. While by no means can we claim to know the reasons for these companies’ decisions, it’s something that you wouldn’t have seen happen a decade ago.
We feel that one of the critical reasons open source has become so relevant is the freedom of choice. We want to share our knowledge and discoveries, and open source gives us that. The Free Software Foundation describes the importance of this, which very much resonates here at Sentry:
To use free software is to make a political and ethical choice asserting the right to learn, and share what we learn with others. Free software has become the foundation of a learning society where we share our knowledge in a way that others can build upon and enjoy.
The world in which being a closed source product means more relevance, or more value, is simply not reality – in fact, we consider it just the opposite. The question we should be asking ourselves today is, why are we still using proprietary software?
The Business of Sentry
Everyone should be using Sentry. That’s the singular goal of both our company and the open source project. It means first and foremost the distribution, license, and business model have to support those efforts. Open source is key to letting us doing that, and most importantly, letting us do it in a way that aligns with our ethics as a company.
Of the tens of thousands of companies already using Sentry, many use the open source version behind their firewall. It will be a long time before that is no longer true, but these days we’re seeing more and more adoption of our hosted service offering. Whether you’re the largest software company in the world or just a startup, your business has more important concerns than configuring, operating, and scaling services like Sentry.
A key part of Sentry is how we do product development, and that’s fairly different from most open source projects. Sentry has a top-down product roadmap that drives it forward, rather than an unclear set of community goals. This means we’ll happily accept contributions that align with our goals, but we’re not a traditional open source project. We don’t resort to telling people “Pull Requests Accepted”, and we don’t accept changes just because they’ve been requested. In a way, we think of ourselves as the BDFL of Sentry, which is a term coined from the Python community:
In the Python community, [Guido] Van Rossum is known as a “Benevolent Dictator For Life” (BDFL), meaning that he continues to oversee the Python development process, making decisions where necessary.
When it comes down to it, we want the community to help drive our direction just like any other users of the product would, but we’re not asking you to build it for us.
As part of that, we’re taking a strong stance on what it is that we do best. First and foremost, that’s building out features that evolve our hosted service, one which happens to be uniquely open source. We believe the days of running everything in your own datacenter are long gone, and our priorities should reflect those beliefs. As a business, this is an important stance for us, as it lets us continue to deliver on our missions. Everyone should be using Sentry, whether it’s our open source project, or our hosted service.
Most importantly, we’ll continue to build and iterate, and do whatever it takes to ensure that your teams can trust both Sentry the product, and Sentry the service.
Sentry processes over a billion errors every month. We’ve been able to scale most of our systems, but in the last few months, one component has stood out as a computational chokepoint: Python’s source map processing.
When we had written the original processing pipeline almost 4 years ago, the source map ecosystem was just starting to come to fruition. As it grew into what is now a complex and mature mapping process, so did our time to process them in Python.
As of yesterday, we have dramatically cut down that processing time (and CPU utilization on our machines) by replacing our source map handling with a Rust module that we interface with from Python.
To explain how we got here, we first need to better explain source maps and their shortcomings in Python.
Source Maps in Python
As our user’s applications are becoming more and more complex, so do their source maps. Parsing the JSON itself is fast enough in Python, as they mostly contain just for a few strings. The problem lies in objectification. Each source map token yields a single Python object, and we had some source maps that expanded to a few million tokens.
The problem with objectifying source map tokens is that we pay an enormous price for a base Python object, just to get a few bytes from a token. Additionally, all these objects engage in reference counting and garbage collection, which contributes even further to the overhead. Handling a 30MB source map makes a single Python process expand to ~800MB in memory, executing millions of memory allocations and keeping the garbage collector very busy with tokens’ short-lived nature.
Since this objectification requires object headers and garbage collection mechanisms, we had very little room for actual processing improvement inside of Python.
Source Maps in Rust
After the investigation had pointed us towards Python’s shortcomings, we decided to vet the performance of our Rust source map parser, perviously written for our CLI tool. After applying the parser to a particularly problematic source map, it showed that parsing with this library alone could cut down the processing time from >20 seconds to <0.5 sec. This meant that even ignoring any optimizations, just replacing Python with Rust could relieve our chokepoint.
Once we proved that Rust was definitively faster, we cleaned up some Sentry internal APIs so that we could replace our original implementation with a new library. That Python library is named libsourcemap and is a thin wrapper around our own rust-sourcemap.
After deploying the library, the machines that were dedicated to source map processing instantly sighed in relief.
With all of the CPUs efficiently processing, our worst source map times diminished to a tenth of their original time.
More importantly, the slowest times were not the only maps to receive improvements. The average processing time reduced to ~400ms.
Embedding Rust in Python
There are various methods to expose a Rust library to Python and the other way round. We chose to compile our crate into a dylib and to provide some good ol’ C functions, exposed to Python through CFFI and C headers. With the headers, CFFI generates a tiny shim that can call out into Rust. From there, libsourcemap can open a dynamically shared library that is generated from Rust at runtime.
There are two steps to this process. The first is a build module that configures CFFI when setup.py runs:
After building the module, the header is ran through the C preprocessor so that it expands macros, a process that CFFI cannot do by itself. Additionally, this tells CFFI where to put the generated shim module. All that needs to happen after that is loading the module:
The next step is to write some wrapper code to provide a Python API to the Rust objects, and since we’re Sentry, we started with the ability to forward exceptions. This happens in a two-part process: First, we made sure that in Rust, we used result objects wherever possible. In addition, we set up landing pads for panics to make sure they never cross a DLL boundary. Second, we defined a helper struct that can store error information; and passed it as an out parameter to functions that can fail.
In Python, a helper context manager was provided:
We have a dictionary of specific error classes (special_errors) but if no specific error can be found, a generic SourceMapError will be raised.
From there, we can actually define the base class for a source map:
Exposing a C ABI in Rust
We start with a C header containing some exported functions, how can we export them from Rust? There are two tools: the special #[no_mangle] attribute, and the std::panic module; providing a landing pad for Rust panics. We built ourselves some helpers to deal with this: a function to notify Python about an exception and two landing pad helpers; a generic one and one that boxes up the return value. With this, it becomes quite nice to write wrapper methods:
The way boxed_landingpad works is quite simple. It invokes the closure, catches the panic with panic::catch_unwind , unwraps the result and boxes up the success value in a raw pointer. In case an error happens it fills out err_out and returns a NULL pointer. In lsm_view_free, one just has to reconstruct the box from the raw pointer.
Building the Extension
To actually build the extension, we have to run some less-than-beautiful steps inside of setuptools.
Thankfully, it did not take us much time to write it since we already had a similar set of steps for our DSYM handling library.
The handy part of this setup is that a source distribution invokes cargo for building, and binary wheels for installing the final dylib, removing the need for any end-user to navigate the Rust toolchain.
What went well? What didn’t?
I was asked on Twitter: “what alternatives to Rust there would have been?” Truth be told, Rust is pretty hard to replace for this. The reason is that, unless you want to fully rewrite an entire Python component in a different codebase, you can only write a native extension. In that case, your requirements to the language are pretty harsh: it must not have an invasive runtime, must not have a GC, and must support the C ABI. Right now, the only languages I think that fit this are C, C++, and Rust.
What worked well:
Marrying Rust and Python with CFFI. There are some alternatives to this which link against libpython, but it makes for a significantly more complex wheels build.
Using ancient CentOS versions to build somewhat portable Linux wheels with Docker. While this process is tedious, the difference in stability between different Linux flavors and kernels make Docker and CentOS an acceptable build solution.
The Rust ecosystem. We’re using serde for deserialization and a base64 module from crates.io, both working really well together. In addition, the mmap support uses another crate that was provided by the community memmap.
What didn’t work well:
Iteration and compilation times really could be better. We are compiling modules and headers every time we change a character.
The setuptools steps are very brittle. We probably spent more time making setuptools work than any other developmental roadblock that came up. Luckily, we did this once before so it was easier this time around.
While Rust is pretty great for what we do, without a doubt there is a lots that needs to improve. In particular, the infrastructure for exporting C ABIs (and to make them useful for Python) could use lots of improvements. Compile times are also not great at all. Hopefully incremental compilation will help there.
There is even more room for us to improve on this if we want. Instead of parsing the JSON, we can start caching in a more efficient format, which is a bunch of structs stored in memory. In particular, if paired with a file system cache, we could almost entirely eliminate the cost of loading since we bisect the index, and that can be done quite efficiently with mmap.
Given the good results of this we will most likely evaluate Rust more in the future to handle some expensive code paths that are common. However there are no CPU-bound fruits that current hang lower than source maps. For most of our other operations, we’re spending more time waiting for IO.
This project has been a tremendous success. It took us very little time to implement, it lowered processing times for our users, and it also will help us scale horizontally. Rust has been the perfect tool for this job because it allowed us to offload an expensive operation into a native library without having to use C or C++, which would not be well suited for a task of this complexity. While it was very easy to write a source map parser in Rust, it would have been considerably less fun and more work in C or C++.
We love Python at Sentry, and are proud contributors to numerous Python open-source initiatives. While Python remains our favorite go-to, we believe in using the right tool for the job, no matter what language it may be. Rust proved to be the best tool for this job, and we are excited to see where Rust and Python will take us in the future. If you feel the same way, we’re hiring in multiple positions and would love to hear from you.
Meredith recently completed App Academy, after which she spent time as a Teaching Assistant for App Academy’s Jumpstart program. Here at Sentry, she’ll be building tools and processes to enrich the support experience for our customers. When not coding, Meredith alters between being a bed gnome and exploring the great outdoors.
We want to make it even easier to know when crashes are affecting your users, so we’ve created an add-on for JIRA Service Desk that allows you to see relevant Sentry issues without having to leave your JIRA issue page.
You can install the add-on from the Atlassian Marketplace, and then click configure to select which Sentry organization to connect. You’ll see the latest errors your user has experienced from within the JIRA Service Desk panel, making it easy to identify causes of user pain points.
You can also easily create JIRA issues with data pre-populated from Sentry using our existing JIRA integration.
If you want to use Sentry with all your favorite Atlassian products, check out our HipChat and Bitbucket integrations.
For more details on how to get started with these integrations, check out our docs.
If you have questions, feedback or feature requests for any of our integrations, let us know in our forum.
To smooth out our plugin workflow, we’ve migrated our Github, Pivotal Tracker, and GitLab plugins to React. Each of these integrations allow users to seamlessly create new issues, and users now have the ability to link to existing Github, Pivotal, or GitLab issues.
To configure your project’s issue tracking integration, go to your project’s settings, enable the plugin, and enter the credentials for that integration.
By moving to React, we’re able to provide cleaner issue creation interfaces for your favorite issue trackers that don’t force you to navigate away from Sentry issue detail pages. If you have any questions or suggestions about optimizing these integrations, head over to our forum to let us know.
Sentry is the web's check engine light.
Sentry notifies you when your users experience errors in your web and mobile apps.