← Back to Blog Home

Signal: The key to Self-Healing Software

Signal: The key to Self-Healing Software

More code is being written right now than at any point in our industry’s history. A year’s worth of software is now created every month, and most of it is no longer written by people. GitHub COO Kyle Daigle recently said, “There were 1 billion commits in 2025. Now, it’s 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won’t.)”

Coding agents from Anthropic, Cursor, and OpenAI can read a stack trace, reason through tangled code, and produce a real fix faster than a person can find the file. That speed is the story everyone is telling. The one that matters more is what happens after the code ships.

More code, shipped faster, means more breakage reaching real users. And while broken software is a technical problem, it is also a business outcome problem. Customers leave. They churn. They tell other people why. Sentry processes 594 trillion events a year across 200,000 organizations. This is the exclusive context self-healing runs on, the moments when software actually broke. That is the raw signal of software meeting the real world.

This is why software is now crossing a line it has never crossed before. It is starting to fix itself in production. Not flagging an error for a human to triage, not suggesting a fix and leaving you the work, but noticing it is broken, understanding why, and writing the fix. Not everything, and not yet. The simple, well-understood failures resolve on their own, while engineers still own the architecture calls and the messy incidents, and that line moves every month. For the entire history of our industry, reliability was something people supplied. It is becoming a property of the software itself.

The industry calls this self-healing software, and it is no longer a thesis. At Sentry and across a growing number of the teams we work with, it is getting more real in production every day. It is the shift that will sort the teams who understand it from the teams who don’t.

What self-healing looks like in practice: new release deploys to production, issues surface as errors, signals, and outliers, an agent picks up and fixes it, then ships to production. This eliminates tickets, sprints, story points, PRs sitting in a queue, separate QA phases, and release trains. What drives it is context — the broken software signal.

What it takes

Getting here took three pieces. One you buy. One you build. And one you have to earn, which is the piece this quietly hinges on.

The three ingredients of self-healing software: LLM/Models — the LLM layer that can help get the answers when served with the right context; Data Context and Reasoning — the data and reasoning engine powering self-healing with root cause analysis, automated fixes, and agentic investigation; Harness/Workflows — meet developers where they are with in-app UX, automation rules, MCP, CLI, Slack, and their own AI agents.

Agent. The models got good, fast, and increasingly you simply buy it. On its own it is not enough, doing something quickly doesnt always mean it’s done right or done the best way. A model only gets the answer when it is served the right context, and it is rarely the thing that holds an agent back.

Harness. The plumbing that carries a fix from diagnosed to merged, and the work of meeting developers where they already are, in the editor, in CI, in Slack, in their own agents. Standing it up is real engineering. The hardest part is not writing the fix, it is verifying it, confirming the failure your users actually hit is gone, not just that the code compiles and the tests pass. And none of it works without something true to point it at.

Signal. For software to fix itself, it first has to know it is broken. Not in theory, but out in the real world, where code that passed every test still falls apart on a browser that updated overnight, behind an ad blocker that flagged your domain, on the one device you never tested. That break is not in the code. No model staring at your repository will ever find it, because it is not there to be found. It is in the gap between what you built and what your users are actually living through. Agents do not act on instinct. They act on signal. Without the right context an agent guesses, trying a fix, missing, trying again, burning time and money on attempts that were never going to land. With it, the agent stops gambling and executes against a real diagnosis of what broke, for whom, since which release.

Signal is the scarce ingredient. Intelligence you buy. Harness you build. The signal you have to capture, by watching the real thing in the real world the instant it breaks. It is what makes the model useful and the harness honest. It is the piece Sentry has spent over a decade building.

What it looks like in practice

This is not a forecast. Three companies you know are already running the loop.

Cursor builds the AI editor a lot of engineers now keep open all day. Holding a desktop client stable across that many processes and that many long sessions is hard, and when it breaks, it breaks in production. So Cursor’s client infrastructure team built a daily automation on top of Sentry’s crash data. It pulls that signal from Sentry, hands them to an agent, and asks which feature broke and whether the stack points to a real, fixable root cause or just code that happened to be running nearby. When the agent is confident enough, it dispatches a second agent to propose the fix. Nothing is auto-merged. Every PR gets a human review.

On their most common failure, out-of-memory crashes, that loop helped cut the rate by up to 80% from its peak, with 30% of their app-stability gains attributed to Sentry.

Factory takes it a step further. Its product is a fleet of autonomous agents, called Droids, that do real engineering work, and it wires Sentry in as the production context those agents run on. When an error fires, a Droid pulls the issue, checks how many users are affected, replays the session, and proposes a fix, without a human ever opening a browser. The same works inside its customers’ own Sentry environments, where a Droid reads the error, traces the root cause, and often opens the pull request on its own. Across its platform Factory runs hundreds of millions of events a month across eight projects.

What that looks like on a bad day: after one release, Sentry surfaced a missing-index error from a code path that had just gone live. The fix, the exact index the query needed, was carried inside the error itself, so Factory’s incident agent created it and resolved the issue within minutes. “Errors are easily accessible to our Droids,” says Alvin Sng of Factory, “which can autonomously handle them end-to-end.”

Ramp built its own background coding agent, Inspect, and they were deliberate about the hard part: verification. Inspect does not just write code, it closes the loop on proving the change holds, using the same context and tools a Ramp engineer would. Sentry is one of those tools, wired in so the agent can review production telemetry rather than guess.

Thousands of companies using Sentry are seeing the same pattern. Signal carries the diagnosis. The agent acts on it instead of guessing. The human reviews a fix instead of hunting for one. As Factory puts it, monitoring stops being a dashboard an engineer opens and becomes infrastructure the agents consume.

What it changes

For as long as we’ve built software, reliability has been something people supply. Code breaks, a human notices, drops what they’re doing, and repairs it. That arrangement is ending. When the signal is good enough, the loop closes on its own: the break is caught the moment it reaches a real user, understood in the context of what actually broke and for whom, fixed by an agent acting on a real diagnosis instead of a guess, and the most that reaches a person is a fix to approve. No ticket, no sprint, no story points, no PR aging in a queue, no separate QA pass, no release train. The tax every team has quietly paid, its best people spending their best hours chasing bugs instead of building what’s next, becomes optional. And the capacity that frees doesn’t vanish. It compounds into the only work that matters: deciding what to build, and building it.

None of this works without Signal. An autonomous break-fix loop is only as good as its understanding of what broke in production, and that understanding is the scarce ingredient, the one you can’t buy off the shelf or generate from the codebase alone. It has to be captured from the real world the instant software meets it. That is what Sentry has spent its existence building, and it is why agents from Cursor, Factory, and Ramp already lean on Sentry to verify and resolve their own work. Sentry owns the signal layer. Self-healing software is what that signal makes possible, and it is already here.

Syntax.fm logo

Listen to the Syntax Podcast

Of course we sponsor a developer podcast. Check it out on your favorite listening platform.

Listen To Syntax