The Monitor — Valentino Volonghi, CTO of AdRoll

Richard Huffaker - February 1, 2018 · 3 min read

The Monitor — Valentino Volonghi, CTO of AdRoll

For the fourth edition of The Monitor, we spoke to Valentino Volonghi, the CTO of AdRoll and a member of its founding team. AdRoll is the most widely used prospecting and retargeting platform in the world. They have 100,000 customers across 35 countries and process right around 70 billion requests a day.

As CTO of AdRoll, I oversee all of the technology that goes into running our business on thousands of globally distributed machines. It’s my job to make sure that when the unexpected happens (and it will) that it is isolated and managed before it sets off the domino effect. That makes real-time monitoring of our system a critical aspect of our business.

Here at AdRoll we help thousands of companies drive people to their websites through retargeting and prospecting ads. They use AdRoll to target people who visited their site but didn’t convert and show them an ad to get them to comeback. Or they can get new visitors to their site by targeting people who are similar to their existing customers.

If an issue spreads and creates just a 1% error rate, that amounts to 700 million errors and could easily cost us well over a million dollars a day.
— Valentino Volonghi, CTO, AdRoll

But to do that, every day AdRoll needs to handle over 70 billion requests from all over the internet and all across the globe. It’s an almost unfathomable number of requests that need to be processed. And since each one of those requests needs to be handled in 100ms or less, our infrastructure needs to be globally distributed.

Today we have AdRoll deployed on as many as 3,000 different machines across the globe. At the scale of AdRoll, if an issue spreads through our system and creates just a 1% error rate, that amounts to 700 million errors a day. An error rate like that could easily cost us well over a million dollars a day.

With that many requests happening on that many machines, all around the world, every machine needs to be monitored for the unexpected, so when that first domino falls, we know and can react.

Each machine has a complicated network of decisions, buying and delivering ads, and we log that entire flow. That translates into about seven trillion events every day. These events are the core of our monitoring. It tells us how AdRoll is operating.

A pillar of our monitoring and incident response philosophy is that instances are not to be coddled. A machine can just be killed and rebooted and nobody is going to cry over it. That philosophy and the ability to act on it is crucial in stopping the bleeding as soon as an issue presents itself, as soon as that first domino falls. This allows us to isolate an issue before it has the chance to set off a domino effect across the system.

Knowing that an issue is controlled gives our engineering team breathing room to approach any issue a little bit more calmly. This is the basis for our Blue-Green deployment strategy.

Solutions

Products

Products

AI Debugging

AI Debugging

Integrations

Integrations

Learn

Learn

Support

Support

Hang out with us

Hang out with us

Bi-weekly Intro to Sentry Demo

The Monitor — Valentino Volonghi, CTO of AdRoll

Listen to the Syntax Podcast

Solutions

Products

Products

AI Debugging

AI Debugging

Integrations

Integrations

Learn

Learn

Support

Support

Hang out with us

Hang out with us

Bi-weekly Intro to Sentry Demo

Signal: The key to Self-Healing Software

Better, faster, less wrong: Enhancing issue grouping

Catch visual regressions with Snapshots, now in beta

Listen to the Syntax Podcast