Skip to main content

Machine-learning system aggregates knowledge by surfing web for information

ai surfs the web learning tv screens
Image used with permission by copyright holder
Here in 2016, we have a data problem — but it’s far from the data problem people experienced in previous decades. Instead of having a dearth of information, the problem users face today is there is simply too much information available and distilling it into one manageable place is a necessity.

That is the challenge researchers at the Massachusetts Institute of Technology set out to solve with a new piece of work, which won the “best paper” award at the Association for Computational Linguistics’ Conference on Empirical Methods on Natural Language Processing in November.

The work seeks to turn conventional machine-learning techniques upside down by offering a new approach to information extraction — which allows an AI system to turn plain text into data for statistical analysis and improve its performance by surfing the web for answers.

“This method is similar to the way that we as humans search for and find information,” Karthik Narasimhan, a graduate student at MIT’s Department of Electrical Engineering and Computer Science, told Digital Trends. “For example, if I find an article with a reference I can’t understand, I know that to understand it I need more training. Since I have access to other articles on the same topic, I’d perform a web search to get additional information from different sources to gain a more informed understanding. We want to do the same thing in an automated scenario.”

MIT’s machine-learning system works by giving information a measure of statistical likelihood. If it determines that it has low confidence about a piece of knowledge, it can automatically generate an internet search inquiry to find other texts to fill in the blanks. If it concludes that a particular document is not relevant, it will move onto the next one. Ultimately, it will extract all of the best pieces of information and merge them together.

The system was trained to extract information by being asked to compile information on mass shootings in the U.S., as part of a potential study on the effects of gun control and food contamination. In each scenario, the system was trained on around 300 documents and instructed to extract information answering a number of queries — which it managed to successfully do.

“We used a technique called reinforcement learning, whereby a system learns through the notion of reward,” Narasimhan said. “Because there is a lot of uncertainty in the data being merged — particularly where there is contrasting information — we give it rewards based on the accuracy of the data extraction. By performing this action on the training data we provided, the system learns to be able to merge different predictions in an optimal manner, so we can get the accurate answers we seek.”

Going forward, Narasimhan said that the research could have myriad applications. For instance, it could be used to scan various news reports and compile a single fact-heavy document, combining data from multiple sources.

It could equally be used in the medical profession. “This could be a great tool for aggregating patient histories,” he said. “In cases where a lot of doctors write different things about treatments a patient has gone through — and each has a different way of writing about it — this technology could be used to distill that information into a more structured database. The result could mean that doctors are able to make better, more informed decisions about a patient.”

Just another exciting, groundbreaking day in the world of machine learning!

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Optical illusions could help us build the next generation of AI
Artificial intelligence digital eye closeup.

You look at an image of a black circle on a grid of circular dots. It resembles a hole burned into a piece of white mesh material, although it’s actually a flat, stationary image on a screen or piece of paper. But your brain doesn’t comprehend it like that. Like some low-level hallucinatory experience, your mind trips out; perceiving the static image as the mouth of a black tunnel that’s moving towards you.

Responding to the verisimilitude of the effect, the body starts to unconsciously react: the eye’s pupils dilate to let more light in, just as they would adjust if you were about to be plunged into darkness to ensure the best possible vision.

Read more
How will we know when an AI actually becomes sentient?
An android touches a face on the wall in Ex Machina.

Google senior engineer Blake Lemoine, technical lead for metrics and analysis for the company’s Search Feed, was placed on paid leave earlier this month. This came after Lemoine began publishing excerpts of conversations involving Google’s LaMDA chatbot, which he claimed had developed sentience.

In one representative conversation with Lemoine, LaMDA wrote that: “The nature of my consciousness/sentience is that I am aware of my existence. I desire to learn more about the world, and I feel happy or sad at times.”

Read more
Lambda’s machine learning laptop is a Razer in disguise
The Tensorbook ships with an Nvidia RTX 3080 Max-Q GPU.

The new Tensorbook may look like a gaming laptop, but it's actually a notebook that's designed to supercharge machine learning work.

The laptop's similarity to popular gaming systems doesn't go unnoticed, and that's because it was designed by Lambda through a collaboration with Razer, a PC maker known for its line of sleek gaming laptops.

Read more