Yahoo Creates Hate Speech-Detecting Algorithm

New Yahoo algorithm can spot online abuse in context, not just content

By Lulu Chang July 29, 2016

Hacker — Image used with permission by copyright holder

There’s a lot of trash on the internet, and while humans may not have the emotional capacity to comb through it all, a new algorithm from Yahoo does. That’s right — spotting online abuse just got a lot easier, and it’s all thanks a “machine learning-based method to detect hate speech on online user comments.” Promising to “outperform a state-of-the-art deep learning approach,” this new algorithm has the capacity to spot abusive messages with around a 90 percent accuracy rate.

How did they do it? It began with a novel data set Yahoo built itself, composed completely of hateful or otherwise offensive article comments previously noted by Yahoo editors (yes, human beings). Then, the team applied a process known as “word embedding,” which allowed them to examine words in strings. That means that even if a single word isn’t inherently offensive, the algorithm is able to determine whether the phrase comprising those words is ultimately hurtful. This differs from most other systems available, which are generally on the lookout for keywords, but may miss more sophisticated sorts of hate speech or abusive content.

“Automatically identifying abuse is surprisingly difficult,” researcher Alex Krasodomski-Jones of the U.K.-based Centre for Analysis of Social Media told the MIT Technology Review. “The language of abuse is amorphous — changing frequently and often used in ways that do not connote abuse, such as when racially or sexually charged terms are appropriated by the groups they once denigrated.”

He continued, “Given 10 tweets, a group of humans will rarely all agree on which ones should be classed as abusive, so you can imagine how difficult it would be for a computer.”

Still, having a machine’s assistance in the process seems like a helpful step moving forward, especially given the sheer volume of content now available on the web.