Sentry Data Wash Now Offering Advanced Scrubbing
Over the past week, we rolled out access to Advanced Data Scrubbing for all users. If you were one of our Early Adopters, you’ve known about this for a couple of months. As the name implies, it’s an addition to our existing server-side data scrubbing features, meant to provide greater control and more tools to help you choose which data to redact from events.
What is data scrubbing?
One of Sentry’s main selling points as an error monitoring platform is the data it collects and aggregates. That not only includes error messages and stack traces but also things like currently visited URL or browser used. It’s not a stretch to imagine that a naive service provider collecting such information could quickly pose a threat to end-user privacy. Sentry is not naive.
Sentry implements a number of technical measures to limit the storage of sensitive data. We keep crash reports and other event data for a limited amount of time and our newer SDKs don’t send known sensitive fields — such as certain HTTP headers or IP addresses — to Sentry by default. Most relevant in this blog post, our SDKs provide hooks to run your own code on event data before sending it to the server, and settings in the server UI redact (“scrub”) keyword-based data prior to saving. For a comprehensive overview, head over to our documentation about sensitive data.
Historically, Sentry’s server-side data scrubbing solved two concerns: removing data that seems sensitive (such as, number patterns resembling credit card information), and removing/retaining data based on user-defined keywords.
What’s new to data scrubbing?
Server-side data scrubbing settings gives even more control over the detection and removal of sensitive data. Among other updates, now you can:
- Define custom regular expressions to match data
- Hash sensitive data rather than remove it
- Limit each individual “rule” to a subsection of the event, which helps with overzealous data removal
All of this new functionality is exposed via a new rule-based system. Any configuration created is applied in addition to existing data scrubbing settings.
A Simple Example
Here’s a quick example of how to use advanced data scrubbing.
Consider a stack trace containing this file path:
The user name may be enough to uniquely identify an end-user. To permanently delete this kind of data, we will:
- Configure a data scrubbing rule to redact the user name in new events
- Delete the issue to get rid of existing sensitive data
When you look at your project or organization settings, you’ll notice a new “Security & Privacy” sidebar tab.
It’s a basic reorg, more or less. Most of what’s now on that page used to be under general settings. The only thing we’ve added is the “Advanced Data Scrubbing” section:
Click on “Add Rule” at the bottom right of the page. You’ll see a dialog like this:
-
Choose “Replace” to substitute the sensitive data with a placeholder. Or, you can choose “Remove” to replace it with the empty string, “Mask” to replace each individual character (preserving information about the length of the sensitive value), or “Hash” to replace the sensitive value with a hash of itself (preserving uniqueness and equality properties, which provides less anonymity but may help determine the number of unique affected users).
-
For the purpose of this example, we’ll pick
[user]
for “Custom Placeholder”, but the choice of value doesn’t matter much. Now, you’ll see this value in your newer events, rather than username. We recommend picking something that provides context while representing the replaced value and is clearly distinguishable from the value itself. You can always leave this field blank for a default placeholder of[Filtered]
. -
Choose “Usernames in file paths” for “Data Type”. Think of data types as heuristics and patterns that help Sentry determine whether a value is sensitive. You can also provide your own regular expression here (by selecting “Regex matches”), and in fact, a lot of the options presented are just regular expressions under the hood.
-
Write
$frame.abs_path
into “Source” to look into frame file paths for sensitive data and nothing else. Other choices like$string
and**
generally work as well but are often too broad to be useful. For example, a file path in other parts of the event may contain usernames that don’t refer to the user device whatsoever. Context matters, and “Source*”* provides that context to Sentry. -
You can use the “Event ID” field for further assistance when filling out this form. This is merely a convenience feature that helps Sentry understand your objective and provide better autocompletion for “Source”. The “Event ID” refers to the value shown when viewing an error:
If you copy-paste that value into the “Event ID” field, you should get tab-completion for
$frame.abs_path
, including examples taken from the event to help you understand what this configuration value represents.
After doing all of that and hitting “Save Rule”, “newer events” will no longer contain sensitive data:
However, our settings didn’t affect previously sent events. If you want to get rid of sensitive data already processed and stored by Sentry, the best way to do that is by deleting the entire issue to avoid storing sensitive data:
After that, file paths won’t show usernames. Head over to our documentation about Advanced Data Scrubbing to learn about all available settings and options.
We’re Not Done Yet
We’ve still got a lot on the horizon. We’re building more features to complement server-side scrubbing — including a way to apply data scrubbing settings before it hits our servers and improvements to iterating on scrubbing rules. Stay tuned.