AI Application Insights with Sentry LLM Monitoring

Ben Peven - July 9, 2024

The data you need to monitor AI-powered applications differs from other parts of your tech stack. Whether you need to debug an error, improve performance, or better understand costs, the context you need for an app calling a large language model (LLM) is different from an app that isn’t.

To help developers understand how their AI-powered applications are performing in production, we built Sentry LLM Monitoring (currently in beta), our newest addition to Insights - now available to all Business and Enterprise plans.

Whether you're running a chatbot, recommendation system, or any LLM-powered application, Sentry LLM Monitoring helps you debug issues fast and control your LLM token costs.

Event

Behind the Code

Monitoring LLM applications for cost, performance, and errors

Token costs for LLM-powered applications can add up fast. Sentry LLM Monitoring helps keep token usage and cost in check with dashboards and alerts across all your models or through an AI pipeline.

When your app experiences a slowdown, you can view performance by AI pipeline and trace the slowdown back to the sequence of events leading to the LLM call. This means if your app is taking longer than expected to respond, you can look at all the spans and child spans within related services and even view the prompt sent to the model.

Like any part of your tech stack, errors are bound to happen. Sentry will aggregate error events related to your LLM project into a single issue, as we do for every other part of your tech stack. With metadata like the user prompt, model version, and tags, you can efficiently debug production issues like API errors and invalid model inputs.

Getting started

To get started, all you have to do is install the latest version of the Sentry Python SDK, and install the Sentry Python SDK integration for LLM providers like OpenAI, Anthropic, Cohere, Hugging Face, or LangChain.

If you’re using an LLM orchestrator (e.g., LangChain) to create pipelines for one or more LLMs, then Sentry will reference how the pipelines are named in the orchestrator. This allows Sentry to show a table of the AI pipelines and pull the token usage from your LLMs.

If you’re not using an orchestrator, you need to mark your pipelines with the @ai_track decorator and (optionally) send prompts to Sentry with the allow_default_pii=True option to the sentry.init call. Learn more about the configuration process in our docs here.

How Sentry uses LLM Monitoring to debug Autofix

Sentry’s Autofix feature is like having a junior developer on-demand. It analyzes linked git repositories and issue details, summarizes its findings and the root cause, and then drafts a pull request for you to review. Behind the scenes, there are a number of AI pipelines for things like indexing the codebase, analyzing the issue, generating the root cause analysis, and the potential fix.

To make sure we’re providing results quickly and accurately, we’re using LLM Monitoring to know when errors happen, if costs increase, or if there’s a slowdown. For example, recently, the root cause analysis AI pipeline experienced a slowdown. The dashboard showed that one span was taking significantly longer than the others. By viewing example chat completion child spans containing LLM prompts and responses, the team could identify changes to how the data passes to the LLM that would lower response iterations and improve performance.

Understand what your AI application is thinking

Sentry LLM Monitoring gives developers the debugging context and visibility into the cost and performance they need to make their AI-powered applications more effective and efficient. With automatic token cost and usage calculation, context like the model version, user prompts, and the sequence of calls to the LLM help you resolve issues fast and keep costs under control.

Speaking of costs, Sentry LLM Monitoring is included for free with the Business and Enterprise Plans. If you experience issues or have general feedback on LLM Monitoring, please share it with us on Discord.

Bi-weekly Intro to Sentry Demo

AI Application Insights with Sentry LLM Monitoring

Behind the Code

Monitoring LLM applications for cost, performance, and errors

Getting started

How Sentry uses LLM Monitoring to debug Autofix

Understand what your AI application is thinking

Code breaks, fix it faster

How Anthropic solved scaling log volume with Sentry

Listen to the Syntax Podcast

Bi-weekly Intro to Sentry Demo

AI Application Insights with Sentry LLM Monitoring

Behind the Code

Getting startedGetting started

How Sentry uses LLM Monitoring to debug AutofixHow Sentry uses LLM Monitoring to debug Autofix

Understand what your AI application is thinkingUnderstand what your AI application is thinking

Code breaks, fix it faster

How Anthropic solved scaling log volume with Sentry

Listen to the Syntax Podcast

Getting started

How Sentry uses LLM Monitoring to debug Autofix

Understand what your AI application is thinking