Easy A/B Testing with PlanOut
So you want to A/B test your web app.
The idea is simple, but the details can get messy, and you don’t want to re-invent the wheel. Services like Optimizely are pretty good, but they can be expensive and full of features you don’t need immediately.
In this post, we’ll share how Sentry wrote an experimentation system with minimal work.
A year ago, when we began running experiments, our requirements were the following:
- It should be easy to write new experiments and assign Users or Organizations to different variants.
- Results of an experiment should be measurable easily and accurately.
- There should be negligible performance impact.
We also wanted the following, but they weren’t priority:
- Running simultaneous experiments with mutual exclusion should be easy.
- In the future, we shouldn’t have completely different systems for experiments and feature flags (used for controlled rollouts of new features) as they’re conceptually related.
We eventually settled on Facebook’s PlanOut library, which makes writing simple and complex experiments equally easy and has been battle tested by a company that runs thousands of experiments daily.
If you’re unfamiliar with A/B testing, here’s a good summary. To describe our system, let’s dive into a real experiment we ran (results at the end).
Getting started with Sentry requires, at the minimum, installing the SDK and sending an error. This can be a high-friction step. So, to get the user over the hump, we allow them to see a sample event to check out Sentry’s features. Historically, this has been much later in the new-user flow, and we wanted to see what would happen if we moved it earlier in the flow.
We decided to have two treatment groups. One would be shown a See Sample Event button at the top of the installation-instructions page.
The second group would see a more subtle link with the same text at the bottom of the page. The control group would continue to see the call-to-action (CTA) later in the flow.
PlanOut allows you to write an experiment by defining a method
assign() that determines which user should see which variant.
from planout.experiment import DefaultExperiment from planout.ops.random import WeightedChoice class SampleEventExperiment(DefaultExperiment): def assign(params, user): # Split users 1:1:2 between treatment 1, treatment 2, and control params.variant = WeightedChoice( choices=['button', 'link', 'control'], weights=[0.25, 0.25, 0.5], unit=user.id, )
That’s it. With just a few lines of code, we have a new experiment.
Internally, PlanOut hashes the user ID and assigns the user a variant based on the specified weights. Note the use of the
WeightedChoice Operator; there are several other built-in operators like
RandomInteger you can use. You can also define your own operators.
When we want to determine the assignment for a given user, we create an instance of the experiment class and call
get() on the parameter that was assigned, in this case
experiment = SampleEventExperiment(user=user) variant = experiment.get('variant')
get() internally calls
assign() and caches the result for future use. This code goes into our request handler, and there are no performance implications because the
assign() method is simple.
When running an experiment, there are generally two types of events you want to log:
- Exposure: the point at which the user is subjected to the experiment and which variant they were shown, e.g., a user seeing the See Sample Event button
- Events: the events you’ll record to know if your experiment was successful, e.g., clicks on the CTA.
get() is called, PlanOut automatically logs an exposure event to the default logger. At Sentry, we override this logging to send the exposure event to our data warehouse instead. Similarly, we also log clicks on the CTA.
Sentry is a single-page React app with a Django backend. This means that there is one initial page load, after which navigation in the app uses AJAX requests for page-specific information. Our experiments are defined on the backend, and are cheap to compute, so we compute all assignments for the current user and send it to the React app with the initial page load. This also means that since
get() is called before the user actually sees the experiment, we disable PlanOut’s auto-logging feature using
set_auto_exposure_logging(False), and log exposure manually when the user visits a page with an experiment.
Once the exposure and events are recorded, we can write a simple SQL query to tell us the click-through rate (CTR) for each variant. This is the fraction of users who were exposed to the experiment that clicked on the CTA. If you use an analytics tool, like Amplitude, this analysis is made even easier and includes automatic computation of statistical significance etc.
While measuring the CTR is simple, it can be a vanity metric not tied to actual outcomes. Therefore, we measured the activation rate instead — what percentage of sign ups successfully send an error to Sentry. The results were very positive, with the
Button variant improving upon the control by 12%:
By having clear requirements and expectations, we simplified our A/B process and made it much less messy without reinventing the wheel. Interested in helping us with our next project? We’re hiring on the Growth team!