Guest Post: Sentry at Opera Software
Michał is engineering manager at Opera, responsible for web services like sync, addons, forms autofill back-end or push notification service. His team works mostly in Python and NoSQL databases and focuses primarily on scalability. sentrycli
is a community-driven tool and separate from sentry-cli, which is maintained by Sentry.
Advanced analysis of issues with sentrycli
If you’re still in the group of people professionally greping through server logs to discover issues with your app, then you could save tons of coffee beans for the last couple of years just by using the right tool for the job.
We decided to use Sentry at Opera based on our experiences using it on various pet projects. Its seamless integration quickly made it a requisite for our web services like sync or addons. While some issues are resolvable just by looking at the stack trace or ubiquitous attributes like User-Agent header, there are whole classes of more subtle problems:
- Sentry’s issue UI shows an error happening on browser A and B and operating systems X and Y. However, the issue might truly only affect (A, X) and (B, Y) and never (A, Y) or (B, X).
- Verifying that yesterday’s release fixed the problem in a single browser, even if the issue was unresolved for other browsers.
- Diagnosing when an issue is only triggered by certain variable values in the stack trace and identifying the problematic value.
To address these scenarios, we decided to create common tool. At the time we started, there was no API to retrieve information we needed. Our initial tool integrated through Postgres. Despite being somewhat ad-hoc, it quickly became indispensable to the team. Since then, Sentry released an official API in Sentry 7, so we refactored the code and open sourced sentrycli.
Installation
You’ll need pip and then simply run:
pip install -U sentrycli
The same command is used to upgrade to the latest version.
Fun part
Getting events
The first step is to cache the events associated with an issue locally. To do that, we need the issue id, which is part of the URL when you’ve viewing a Sentry issue.
The next step is to use sentrycli query
to get events related to the Sentry issue. While running query
for the first time you’ll need to pass an API key (available from your Sentry organization’s settings) and your Sentry URL (it will be https://sentry.io if you’re using hosted Sentry):
> sentrycli query 77268 --api-key xxxxxxxxxxxxxxxxxx --host https://sentry.io
...
INFO:sentrycli.query:295 events saved to /Users/mlowicki/projects/sync/77268.json
After its first use API key and host will be saved so you can skip this part afterwards if you want to use cached values:
> sentrycli query 77268
...
INFO:sentrycli.query:295 events saved to /Users/mlowicki/projects/sync/77268.json
query
supports filtering options like maximum number of events to download (--limit
) or time range of interest (--since
and --to
):
> sentrycli query 77268 --limit 100
...
INFO:sentrycli.query:100 events saved to /Users/mlowicki/projects/sync/77268.json
> sentrycli query 77268 --since 20160430
...
INFO:sentrycli.query:21 events saved to /Users/mlowicki/projects/sync/77268.json
See sentrycli query --help
for more info.
Grouping
Now that we have the events, we can use sentrycli group
to group events by attribute to see their distribution.
For instance, you can group by creation time (ctime):
> sentrycli group 77268.json --ctime daily
+------------+-------+-----+
| day | count | % |
+------------+-------+-----+
| 2016-04-09 | 25 | 8.5 |
| 2016-04-10 | 27 | 9.2 |
| 2016-04-11 | 14 | 4.7 |
| 2016-04-12 | 8 | 2.7 |
| 2016-04-13 | 11 | 3.7 |
...
It’s especially useful for older problems since Sentry’s UI only shows distribution for the last 30 days.
The real power comes from the ability to aggregate by attributes not grouped by Sentry out of the box (i.e. most HTTP headers):
> sentrycli group 79762.json --header Content-Type
+--------------+-------+------+
| Content-Type | count | % |
+--------------+-------+------+
| text/xml | 16 | 50.0 |
| text/plain | 16 | 50.0 |
+--------------+-------+------+
or compositions of various properties:
> sentrycli group 77268.json --tag browser server_name --header Content-Type
+--------------------------+-----------------+----------------------+-------+------+
| Content-Type | browser | server_name | count | % |
+--------------------------+-----------------+----------------------+-------+------+
| application/octet-stream | Opera Mini 14.0 | front3.sync.lati.osa | 55 | 18.6 |
| application/octet-stream | Opera Mini 14.0 | front4.sync.lati.osa | 53 | 18.0 |
| application/octet-stream | Opera Mini 14.0 | front5.sync.lati.osa | 52 | 17.6 |
| application/octet-stream | Opera Mini 14.0 | front6.sync.lati.osa | 48 | 16.3 |
| application/octet-stream | Opera Mini 14.0 | front1.sync.lati.osa | 45 | 15.3 |
| application/octet-stream | Opera Mini 14.0 | front2.sync.lati.osa | 42 | 14.2 |
+--------------------------+-----------------+----------------------+-------+------+
The above results show that the error only occurred in a single browser (Opera Mini 14) and one datacenter (*.lati.osa).
Checking stack trace
Suppose you looked at the code and isolated problem to scenario when variable (validation_needed
in our example) has certain value. To prove your thesis just run…
> sentrycli group 62456.json --variable validation_needed
+-------------------+-------+-------+
| validation_needed | count | % |
+-------------------+-------+-------+
| False | 295 | 100.0 |
+-------------------+-------+-------+
Total: 295
…and suddenly you’re one step closer to solving the mystery.
Inspecting custom logging data
Some logging libraries, like Python’s logging module, give the option to specify additional metadata.
logger.error('Conflict', extra={'is_folder': entity.folder})
Because Sentry stores everything passed in extra
dictionary, sentrycli can also group by values for keys passed in extra
dictionary.
> sentrycli query 79548 --limit 100
...
> sentrycli group 79548.json --context is_folder
+-------+-------+-------+
| flag | count | % |
+-------+-------+-------+
| False | 100 | 100.0 |
+-------+-------+-------+
Total: 100
Veryfing fixes
At the very beginning, I’ve mentioned the case when a release resolves an issue for a single browser (but not all browsers). To verify the fix, we have two options. First one is to group by release and browser:
> sentrycli group 77268.json --tag release browser
+---------+------------+-------+------+
| release | browser | count | % |
+---------+------------+-------+------+
| fb0145d | Opera 38.0 | 51 | 44.3 |
| fb0145d | Other | 30 | 26.1 |
...
If the issue is older and your team releases very frequently (multiple times a day), grouping by release becomes noisy. To mitigate it, just use query
with date filters so only events within specified time range will be saved:
> sentrycli query 77268 --since 20160430
...
To get the list of possible grouping options for saved events use sentrycli group 77268.json --options
.
Future
There is no roadmap or plan on how sentrycli
should evolve, but we’re open to new ideas. If you have ideas on new features for sentrycli
, just file an issue or send us a pull request.
Have fun while tracking down your bugs.