Share on Twitter
Share on Facebook
Share on HackerNews

How a Hack Week Project Encourages People to Be Nice on the Internet

Early this year, we hosted an internal hack week to make, well, anything — like a hot dog VR game. With their project, Data Engineer Syd Ryan, Support Engineer Maggie Bauer, and Head of Product Dave Hayes want to encourage more participation in open-source software, and they saw a big problem to tackle: as you may know, sometimes people on the internet aren’t nice!

I think the biggest challenge is that online Trust & Safety as a field is not a solved problem, anywhere. We [at GitHub] have best practices based on research and experience, and we’re constantly iterating and improving, but the question of “how do we build healthy online communities” is still an area of active research across the internet.

Lexi Galantino
GitHub

Even though GitHub already has a feature to “temporarily lock down some or all of your repos so that only certain folks can leave comments” (via Lexi Galantino), there’s still a lot of work to be done.

Be Kind Bot

At a hack week brainstorming session, Syd, Maggie, and Dave, along with Technical Writer Mimi Nguyen, reflected on Sentry’s recent lunch-and-learn series on allyship, which featured a talk by Valerie Aurora that sparked questions about why more people from marginalized groups don’t contribute to open source.

Thus, the Be Kind Bot was born! It’s a GitHub bot that helps commenters to… well, to be kind!

The early prototype helps in a few different ways:

  • Using the Perspective API, the bot will reply to comments identified as “rude” or “very rude,” call them out, and link to our values page.
  • Based on a list of stop words, the bot suggests alternatives; for example, it’ll suggest “inefficient” or “inaccurate” instead of “stupid.”
  • Once a user has three of their messages flagged, the bot will ping the owner of the repository so that they can intervene if necessary.

Analyzing and identifying rude comments

How does one interpret a comment and identify it as rude? Syd asked for recommendations in the Out in Tech community Slack group, and many kind souls suggested looking up the term “sentiment analysis.”

Also known as “opinion mining,” sentiment analysis is often applied to customers’ opinions of a product by extracting polarity (positive or negative opinions) and subject matter (hot dogs or three-wolf-moon shirts). But could this type of analysis work in the context of comments on pull requests?

To find out, Dave scraped about 1000 comments from a few repos, including:

Using these sample comments and their own made-up sentences, Syd and Maggie spent the majority of hack week researching and testing multiple APIs, including Google’s natural language API, Aylien’s text analysis API, and an “emotion-polarity classifier” from a research group at the University of Milan.

Unfortunately, none of those APIs reach the heart of the issue; identifying abusive comments or harassment requires more fine-tuned analysis than simply checking the negativity of a comment. For example, “this isn’t working yet” would classify as negative, but, in the context of a pull request, it’s definitely not abusive.

An even bigger challenge is that comments are very context-sensitive. “This is really bad” is OK when referring to a bug (“This bug is really bad”), but definitely problematic when talking about a person (“You’re really bad at coding”).

The Perspective API gave better results, as it was designed specifically to identify “bad comments” by ranking their toxicity. This worked for very rude comments (“this is complete garbage and you are awful”), but it didn’t flag many other rude comments (“this sucks”). Lowering the rudeness threshold resulted in way too many false positives.

As it turns out, this is a hard problem. Despite lots of research, existing tools for identifying problematic comments still have a long way to go.

Building the Be Kind Bot on GitHub

In the end, building the actual bot was the easiest part of Syd and Maggie’s whole project. In less than a day, Dave created the prototype with Probot, an open-source framework for building GitHub apps with Node.js.

Here’s a quick peak at how simple the code is:

// Listen for GitHub's issue_comment event (when anyone comments on an issue)
app.on('issue_comment', async context => {
    context.log("Got a comment!")
    context.log(context.payload.comment.body)

		// Call Perspective API with a given toxicity threshold
    key = process.env.PERSPECTIVE_APIKEY
    toxicity_threshold = 0.85
		// Based on the response, classify the comment (is it rude?)
		// Then decide how Be Kind Bot should respond

		// Be Kind Bot posts a comment to the GitHub issue thread
		context.github.issues.createComment(params)
});

The future of kindness on the web

There’s still a lot of work to be done to improve online communities, and the Be Kind Bot is just one early prototype to help open source project maintainers develop more inclusive, welcoming communities. Maggie, Syd, and Dave learned a bit more about the challenges of applying natural language processing to real-world problems, and we all look forward to following the development of more research and more open-source tools in this space.

Want to help us build cool things? Here are the open roles at Sentry.

Your code is broken. Let's Fix it.
Get Started