May 29, 2024

Removing ad trackers and cookies - the technical perspective

Sentry recently completed a multi-month project to remove all non-essential cookies and trackers from our public websites. For more context, see two blog posts that offer differing perspectives on the project: one from our marketing team, another from our legal team, and a third blog post that explains our privacy values and our ultimate motivation. Today, we are going to focus on the more technical side of this project; how we identified and removed cookies and trackers, the difficulties (both expected and unexpected), and what we wish we knew before we started. Why Are We Doing This? Anyone who’s used the internet has been presented with a popup informing the user that the site they’re visiting uses cookies. For many of us, this popup is a mere annoyance, but less discussed is the fact that cookies introduce security risks. The individual cookies themselves might not represent a high-security risk, but in aggregate, they create compounding liabilities and more surface area for mayhem. Third-party advertising cookies provide benefits, of course, but if hosting them comes with the price of reduced security (not to mention user privacy and annoyance), we asked ourselves if the benefit of these cookies truly outweigh their cost. Different companies serving different markets will have their own answers to this question. But at Sentry, from the leadership on down, it was a clear No; removing advertising cookies and trackers aligned with our corporate privacy values. And for the security team, it’s quite straightforward: we want the Sentry experience to be as secure as possible, so this was a challenge we enthusiastically accepted. Hosting third-party cookies typically means you also have third-party scripts, and from a security standpoint, those additional scripts also carry risks. Scripts and cookies, it turns out, make sites vulnerable to all sorts of attacks like malicious script injections (Cross-site scripting (XSS), Cross-site Request Forgery (CSRF), just to name a few). In a perfect world, scripts that are added to your sites would go through rigorous security reviews to ensure there are no vulnerabilities, but it’s impossible for most companies to actually review every single line of code. Not to mention any and all future updates to boot. Removing even one third-party script on your website, and the cookies associated with it, means one less thing that the security team has to worry about. What’s Actually on Our Website? Before we started removing cookies, we had to figure out exactly which cookies were on our website. The relative complexity of figuring out which cookies are on your site is directly related to the complexity of your site. For small sites, with only a handful of pages, this isn’t terribly difficult. However, things get tricky when you have a site that was built years ago, contains multiple subdomains, thousands of pages, and is managed by multiple teams across an organization. Challenge #1: What Cookies are on Our Website? We started by inspecting the Chrome developer console. You can see the cookies on a page by going to the Application tab and looking for Cookies under the Storage section. Because many modern browsers block them by default, you’ll have to Enable Third-Party Cookies in your browsers in order to see what’s there.

Unfortunately, following the steps above won’t display a comprehensive list of all of the cookies on your site. What you see in this console are the cookies on that single page. In order to identify all of the cookies across your entire website, you will need to go through every single page under every subdomain and inspect them individually. A blog post from four years ago, for instance, may have an embedded YouTube video that drops cookies. In other instances, there may also be cookies that won’t drop until they’re clicked. Another important thing to note is that “cookies” is just a broader term that refers to tracking technology. It is also possible to track users with items stored in the browser’s local storage and tracking pixels. They are a lot harder to locate and identify than cookies, but still count as tracking technology. Simply put, it’s important to look for all tracking technology, not just cookies. It was at this point that we realized going through our site page by page was not scalable and that we would have to employ specialized tools. The Good and the Bad of Cookie Scanners A Google search for “cookie scanning” will return at least a dozen different cookie scanners. We tried a few, both free and paid, and while they all “worked,” none of them were perfect. Most of the scanners we tried were pretty straightforward: you enter a URL, the tool scans it, and you’re given a list of the cookies that were found. This is much more efficient than going to the developer console on a browser, and a good way to get a quick idea of the cookies on your websites. But that’s also where their limits begin to show; they only give you a surface-level understanding of the cookies on your sites. While scanners have their advantages, there were three main problems we encountered. First, they don’t scan every page on your site. Some of them only scan the single page of the URL you provide; while others will scan a few extra pages automatically. One of the paid scanners we tried will automatically crawl your site and look for all the pages under the domain, but they have a maximum number of pages they will scan. This is emblematic of the inconsistency we experienced when trying to find the right tool. For instance, the results we received for a single page were often different depending on which scanner we used, and to make things worse, some of the scanners even have disclaimers pointing out that the results of the tool may not be 100% accurate. This eroded our trust in scanners altogether and made us question if they were worth the effort, to say nothing of the expense. And lastly, the scanners we reviewed only identified cookies, and not trackers in local storage or tracking pixels. So even if we did have a reliable tool for scanning cookies, we’d still fall short of our goal of removing all the tracking technologies that are on our sites. Taking a Step Back: What’s on Your Website? After spending some time working with cookie scanners, we decided to delve into what elements drop cookies on web pages. Namely, scripts. Instead of only looking for cookies, we broadened our attention to the many scripts that are on our site, especially third-party scripts, to check if they were also dropping cookies and tracking users. To find out which scripts were running on our sites, we used Content-Security-Policy (CSP), a feature that most modern browsers support. Mozilla gives a good explanation of CSP:

Content Security Policy (CSP) is an added layer of security that helps to detect and mitigate certain types of attacks, including Cross-Site Scripting (XSS) and data injection attacks. These attacks are used for everything from data theft to site defacement to malware distribution. CSP is designed to be fully backward compatible. Browsers that don’t support it still work with servers that implement it, and vice versa: browsers that don’t support CSP ignore it, functioning as usual, defaulting to the standard same-origin policy for web content. If the site doesn’t offer the CSP header, browsers likewise use the standard same-origin policy. When it comes to detecting scripts, CSP has a reporting feature that alerts website admins when items violate the policy, and if the CSP policy is set to only self for everything, you will receive reports on everything on your site that’s not originating from the site URL, aka third-party scripts. Here’s an example of the CSP setting: Content - Security - Policy : default - src 'self' ; report - uri csp - report . example . com One of the major benefits of CSP is that you don’t need to click through all the pages on your site to test them or use third-party scanning software; every user who visits your site is essentially helping you collect the information. When we started this project, we were averaging around 200k page visits on our site every week, and with that, we were able to put together a fairly comprehensive picture of which scripts were on our sites in a matter of a few days. A few things to note here: The report-uri / report-to value will have to be set in order for the report to be sent.

/ value will have to be set in order for the report to be sent. You will likely want to use the Report Only mode otherwise your site will likely stop working. ⚠️ Be aware that with Report Only mode enabled, CSP won’t be able to block malicious scripts from loading, which is what CSP is designed for, so we don’t suggest using report-only mode other than for testing and collecting the information you need. You will receive reports on items that are based off of different domains, which may include first-party scripts that are hosted on a different domain, e.g. your CDN.

You will NOT receive reports on third-party scripts that are loading from your site URL, such as self-hosted scripts or things behind your proxy. If you want to collect information on those, you can do default-src none , but from our experience that creates way too much noise.

, but from our experience that creates way too much noise. The duration of time you should monitor will depend on the number of active users you have, the fewer visitors you have the longer you’ll want to monitor in order to receive a more accurate report.

CSP will operate on the client side, which means it is possible that browser extensions or custom scripts that a user has on their machine might trigger violations too. For websites that have a good number of visitors, these usually get diluted and become unnoticeable. For the report destination, Sentry supports security policy reporting; if you set the report-uri to your Sentry Project DSN, you will start receiving some CSP reports similar to the following:

Sentry then collects the CSP reports and aggregates them. Doing so helps teams glean valuable insights from the data. The same CSP violations will then be grouped into one, and event data will be available. If you click on each issue, you will also get detailed information on what directive the scripts in question violate, on which pages the violations occurred, how many times they happened, and how many users were impacted, along with a host of other details.