Google’s Hate Speech-Detecting AI is Biased Against Black People

On Aug 13, 2019

Self-Defeating

Artificial intelligence algorithms meant to detect and moderate hate speech online, including the Perspective algorithm built by Google, have built-in biases against black people.

Scientists from the University of Washington found alarming anti-black bias in the AI tools that are supposed to protect marginalized communities from online abuse, according to New Scientist demonstrating how a well-intentioned attempt to make the internet safer could discriminate against already-marginalized communities.

Built-In Bias

The scientists examined how humans had annotated a database of over 100,000 tweets that had been used to train anti-hate speech algorithms, according to yet-unpublished research. They found that the people responsible for labeling whether or not a tweet was toxic tended to flag tweets written in African-American Vernacular English (AAVE) as offensive a bias that then propagated down into the algorithms themselves.

The team confirmed that bias by training several AI systems on the database, finding that the algorithms associated AAVE with hate speech.

Downstream Effects

The team then tested algorithms, including Perspective, on a database of 5.4 million tweets, the authors of which had disclosed their race. The algorithms ranged from being one-and-a-half to twice as likely to flag posts written by people who identified as African-American in the database for being toxic, New Scientist reports.

That means that automated content moderation tools will likely take down a lot of benign posts based on the ethnicity of their posters, leading to silencing and suppression of certain communities online.