Skip to main content

Researchers teach computer to understand dialects by reading Twitter

Twitter
Image used with permission by copyright holder
Computers don’t harbor the more problematic prejudices that are unfortunately still found in parts of society, but that isn’t to say they’re without their faults. One task machines frequently prove less adept at is understanding other dialects, such as an English language dialect considered to originate in some African-American communities. (Researchers term the dialect “African-American English,” which we realize may be regarded as inaccurate by African Americans who don’t share it.) Now, researchers are training AI to recognize and use this dialect.

When it comes to why computers are less good at understanding some dialects than others, there is a logical reason: computer scientists who have spent the past 30 years teaching machines to read have frequently used readily available data, such as back issues of the Wall Street Journal, to carry out the training. Such formal written language has rendered many natural language processing (NLP) systems less adept at understanding language which doesn’t conform to a very specific type.

“If you think about traditional media that have existed for a long time — things like books or, more recently, newspapers — you’re seeing a very standardized dialect of language, associated with elite education and the like,” Brendan O’Connor, a natural language processing expert at the University of Massachusetts Amherst, told Digital Trends. “That’s not specific to English: you see it in every language in the world.”

As O’Connor noted, this no longer has to be the case. The internet — and particularly social media — has opened up a rich data-stream of different dialects which can be used to train the next wave of NLP systems. In a new paper, O’Connor and other researchers created the largest dataset for studying African-American English from online communication, composed of 59 million tweets from 2.8 million users.

“The African-American English dialect has … millions of speakers and is different from standard English in several interesting ways,” O’Connor said. “It’s different enough that our artificial intelligence tools — which are designed for standardized English — perform worse with them; they’re less intelligent at understanding that dialect. African-American English is often incorrectly characterized as ‘not English’ by current classifiers.”

For their paper, O’Connor and his colleagues showed that properly fine-tuned NLP systems are capable of understanding African-American English. The authors plan to release their new model in the next year to better identify English written in this dialect.

“The future next step is to make systems that can do deeper analysis of sentences that are written in different types of English dialects,” he said. “Embracing linguistic diversity is certainly something that needs to be focused on. We highlight the importance of engineering systems that are better at handling different forms of dialect.”

Because, ultimately, making AI systems that can understand everyone equally will be the best possible outcome for all.

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Google’s LaMDA is a smart language A.I. for better understanding conversation
LaMDA model

Artificial intelligence has made extraordinary advances when it comes to understanding words and even being able to translate them into other languages. Google has helped pave the way here with amazing tools like Google Translate and, recently, with its development of Transformer machine learning models. But language is tricky -- and there’s still plenty more work to be done to build A.I. that truly understands us.
Language Model for Dialogue Applications
At Tuesday’s Google I/O, the search giant announced a significant advance in this area with a new language model it calls LaMDA. Short for Language Model for Dialogue Applications, it’s a sophisticated A.I. language tool that Google claims is superior when it comes to understanding context in conversation. As Google CEO Sundar Pichai noted, this might be intelligently parsing an exchange like “What’s the weather today?” “It’s starting to feel like summer. I might eat lunch outside.” That makes perfect sense as a human dialogue, but would befuddle many A.I. systems looking for more literal answers.

LaMDA has superior knowledge of learned concepts which it’s able to synthesize from its training data. Pichai noted that responses never follow the same path twice, so conversations feel less scripted and more responsively natural.

Read more
Language supermodel: How GPT-3 is quietly ushering in the A.I. revolution
Profile of head on computer chip artificial intelligence.

OpenAI’s GPT-2 text-generating algorithm was once considered too dangerous to release. Then it got released -- and the world kept on turning.

In retrospect, the comparatively small GPT-2 language model (a puny 1.5 billion parameters) looks paltry next to its sequel, GPT-3, which boasts a massive 175 billion parameters, was trained on 45 TB of text data, and cost a reported $12 million (at least) to build.

Read more
Why teaching robots to play hide-and-seek could be the key to next-gen A.I.
AI2-Thor multi-agent

Artificial general intelligence, the idea of an intelligent A.I. agent that’s able to understand and learn any intellectual task that humans can do, has long been a component of science fiction. As A.I. gets smarter and smarter -- especially with breakthroughs in machine learning tools that are able to rewrite their code to learn from new experiences -- it’s increasingly widely a part of real artificial intelligence conversations as well.

But how do we measure AGI when it does arrive? Over the years, researchers have laid out a number of possibilities. The most famous remains the Turing Test, in which a human judge interacts, sight unseen, with both humans and a machine, and must try and guess which is which. Two others, Ben Goertzel’s Robot College Student Test and Nils J. Nilsson’s Employment Test, seek to practically test an A.I.’s abilities by seeing whether it could earn a college degree or carry out workplace jobs. Another, which I should personally love to discount, posits that intelligence may be measured by the successful ability to assemble Ikea-style flatpack furniture without problems.

Read more