Google’s Hate Speech-Detecting AI is Biased Against Black People
Scientists from the University of Washington found alarming anti-black bias in the AI tools that are supposed to protect marginalized communities from online abuse, according to New Scientist demonstrating how a well-intentioned attempt to make the internet safer could discriminate against already-marginalized communities.
The scientists examined how humans had annotated a database of over 100,000 tweets that had been used to train anti-hate speech algorithms, according to yet-unpublished research. They found that the people responsible for labeling whether or not a tweet was toxic tended to flag tweets written in African-American Vernacular English (AAVE) as offensive a bias that then propagated down into the algorithms themselves.
The team confirmed that bias by training several AI systems on the database, finding that the algorithms associated AAVE with hate speech.
The team then tested algorithms, including Perspective, on a database of 5.4 million tweets, the authors of which had disclosed their race. The algorithms ranged from being one-and-a-half to twice as likely to flag posts written by people who identified as African-American in the database for being toxic, New Scientist reports.
That means that automated content moderation tools will likely take down a lot of benign posts based on the ethnicity of their posters, leading to silencing and suppression of certain communities online.