Internationalization and React
It's always nice if a project outgrows your initial vision in a way. This happened for the first time in Sentry a long time ago when translations kept rolling in for languages none of us spoke. This was enabled by the excellent gettext-based internationalization support in Django, and the ability to collaborate on through Transifex which is an online tool where people can contribute translations and discuss the strings and raise issues on them.
This worked well enough for many years, but the core infrastructure never changed. What has changed however is the shift in Sentry 8 from server-side rendering to a rich React-powered user interface rendered purely on the client side.
Now that we're reasonably confident with our direction of the client side application, we decided to solve the client-side i18n problem. We knew what we roughly wanted in terms of behavior, and we also had a strong goal to
integrate it with our existing translation pipeline.
The Land of i18n in Python / JavaScript
In the Open Source community (and earlier in the free software community), gettext has established itself as the standard solution for translations. While gettext has many flaws and limitations such a very complex system for pluralization rules and the inability to have multiple numbers in one string or the inability to express genders and cases, it became the standard in the Python world and it got a lot of support through better tooling over the years. In the Python world both Django's integrated translation system as well as the Babel project (not to be confused with the JavaScript project of the same name which is a transpiler) are based on gettext and as such this is what Sentry started out with.
In the JavaScript world there have been other influences and gettext played less of a strong rule. In particular over there the CLDR project as well as ICU left a mark which implemented more novel solutions to old problems.
Thankfully however there are also implementations for many of the gettext style operations which allow interoperability.
Formats, Formats, and more Formats
One of the fundamental problems with gettext is that it only really provides solutions for simple string-to-string translations and barebones support for pluralization. Gettext does not even specify how strings are formatted with regards to placeholders.
For instance, on both the client and server we use Hello %(name)s
to indicate named placeholders or Hello %s
to indicate anonymous ones. This is Python-style string formatting; not gettext. While the lack of standards around this can cause a bunch of issues in more complex environments, it works well enough for us. This lack of support for cases and disambiguation led to gettext adding "context" support, which allows you to provide context/instructions around an individual string, as well as allowing duplicates that are processed differently. This system is also used to handle different genders in languages that need it.
MessageFormat is a much more advanced concept that can deal with multiple pluralizations per string, genders, cases and much more. The downside is that it would not fit into our process and many tools do not support it either. For instance you cannot currently feed MessageFormat strings into Transifex. Because our goal is collaboration, the inability to feed use our existing tools and community was a non-starter.
To understand the complexity behind message format, look at this example from the official documentation:
"{gender_of_host, select, " "female {" "{num_guests, plural, offset:1 " "=0 {{host} does not give a party.}" "=1 {{host} invites {guest} to her party.}" "=2 {{host} invites {guest} and one other person to her party.}" "other {{host} invites {guest} and # other people to her party.}}}" "male {" "{num_guests, plural, offset:1 " "=0 {{host} does not give a party.}" "=1 {{host} invites {guest} to his party.}" "=2 {{host} invites {guest} and one other person to his party.}" "other {{host} invites {guest} and # other people to his party.}}}" "other {" "{num_guests, plural, offset:1 " "=0 {{host} does not give a party.}" "=1 {{host} invites {guest} to their party.}" "=2 {{host} invites {guest} and one other person to their party.}" "other {{host} invites {guest} and # other people to their party.}}}}"
So because we cannot use MessageFormat, we need to match the client and server behavior of gettext. On the server we already are married to Python string formats. Not the new type, but the old type. This means that
strings currently look like %(this)s
. Thankfully there are implementations for exactly this formatting syntax for JavaScript in npm. In particular we ended up using sprintf-js. For invoking gettext with the right pluralization forms chose Jed.
Complex UI Interactions
The basic gettext calls get us quite far in regards to translating basic strings. Unfortunately the problem is a lot more complex than that in React land. As an example when you need to include a link in a translatable string. On the server we could just include the HTML as a variable in the string like so:
gettext('Click <a href="%(link)s">here</a>') % {
'link': url_for('.activation_link')
}
So how do we do this in React? Our solution is a component-aware helper:
<p><small>{tct('Already have things setup? [link:Get your DSN].', {
link: <a onClick={this.toggleDsn} />
})}</small></p>
tct
here is our marker function. It stands for "translate component template" and the string passed is the component template. The syntax is very simple: brackets enclose components, the name of the component and it's content are separated by a colon. So in the example above link
is the name of the component and Get your DSN
is the content. The components are then provided separately and modified appropriately.
In addition to that our basic gettext function can also accept React components as placeholders in addition to basic values. Whenever we detect that any of the arguments to the format string is actually a react component we switch our handling over to a system that can produce an array of elements for react. This means that it's completely valid to do this:
return t('%(author)s assigned this event to %(assignee)s', {
author: <Author value={author} />,
assignee: <em>{assignee.email}</em>
});
Identifying Unmarked Translations
It's very easy to miss translations so we wanted to have a method to visualize the translations within the UI. Because we already have a lot of stuff rendered directly into react elements we decided that we could just return react components from the translation function instead of a string. In that case we can easily style them with bright colors so that it's absolutely clear that an element is translated.
Unfortunately we cannot do that in all cases as we also sometimes process the string to send it into a jQuery component or something similar. For those cases we end up cloning the react component and attaching a custom toString()
method on it that adds an Austrian flag to both sides of the string (might or might not have to do anything with the author of that feature living in Austria):
Because React elements are immutable we can't actually use React.createElement
directly but instead manually construct the React structure so we can attach our custom toString
function.
We also have a custom ES Lint plugin (eslint-plugin-sentry) which tries to find missing translations.
Extracting Translations
So now that strings are marked, how do we extract them so we can add them to our gettext catalogues and into Transifex? We ended up writing a custom Babel plugin for this as we're already running everything through webpack. Truth be told this process isn't the prettiest, but if you are curious about this sort of thing, you can take a look at our plugin on GitHub. The nice thing about going through Babel is that we can automatically run it as part of our webpack build process which is necessary anyways as we use JSX and other transpiling enabled.
It supports // Translators: comments
(comments that allow us to leave notes for translators) and custom function names if you want to change them (So you are not forced to use gettext
/ngettext
etc. We're actually using t
and tn
as shortcuts for those ourselves) In addition it's quite permissive in ignoring extra arguments so you can do the string formatting in one go.
Once extracted we end up with a .po
file which also comes with location markers so we know where strings are used. We then using a custom merge script based on Python Babel (did we mention this is not to be confused with the JavaScript babel :D) to merge our backend strings and frontend strings into one combined catalog. Location markers are also merged together and we then end up with something we can ingest into Transifex.
Shipping the Translations
So now that we've managed to generate .po
files and can sync them with the translations from Transifex we still need to figure out how to get the translations into the Sentry UI. For this the extracted location references come in handy. We wrote a custom webpack loader that fetches our po files from the main Python app and skips over all the strings which are not referenced by the JavaScript app. We then end up serializing out the PO catalogues into JSON and it gets shipped with our frontend app.
Where to go from here?
Making stuff translatable from a technical point of view does not necessarily mean that people can actually translate it. Because the new UI was rapidly iterated on and internationalization was not always considered there are definitely some places where a string cannot yet be translated into other languages. Most of the time this happens because individual words were translated instead of correctly localizing the surround context of where they appear.
Going forward we plan to improve our context support, as well as investigate tooling to ensure that all strings get correctly marked for translation. If you're interested in helping get Sentry in your language, take a look at the project on Transifex.
Ultimately, whether you want to debug JavaScript, do Python error tracking, or handle an obscure PHP exception, we'll be working hard to provide the best possible experience for you and your team, no matter the location or language!