Skip to main content

Yahoo just released a ton of user data in the name of academia

yahoo mail photo sync caller id smart phone mobile app smartphone
Nevodka/Shutterstock
Yahoo just released a ton of data in the name of academia. In what is purported to be the largest ever cache of Internet data ever granted to researchers, the company is granting universities access to the online behaviors of some 20 million anonymous users, including their clicks, hovers, and scrolls across a myriad of Yahoo’s pages. The sheer volume of information, Yahoo says, should allow scientists further their work on machine learning and deep learning.

“Our goals are to promote independent research in the fields of large-scale machine learning and recommender systems, and to help level the playing field between industrial and academic research,” the Internet giant said in a blog post about the recent release. “The dataset is available as part of the Yahoo Labs Webscope data-sharing program, which is a reference library of scientifically useful datasets comprising anonymized user data for noncommercial use.” 

The decision comes as Yahoo faces an alarmingly static period during its two decades of existence, even as chief competitors like Google and other social media companies make huge strides across different fields within the tech industry. So in an effort to innovate, Yahoo is investing deeply in the realm of artificial intelligence, and allowing researchers to see exactly how people actually behave when they’re on the Internet.

Despite the fact that all the data is completely anonymized, users might be alarmed by how much Yahoo is actually telling these institutions (and only these institutions). “In addition to the interaction data,” Yahoo says, “we are providing categorized demographic information (age range, gender, and generalized geographic data) for a subset of the anonymized users. On the item side, we are releasing the title, summary, and key-phrases of the pertinent news article.” Further, the company will also reveal “the relevant local time and also contains partial information about the device on which the user accessed the news feeds, which allows for interesting work in contextual recommendation and temporal data mining.”

This comes as a huge boon to researchers who often don’t have enough data to work with in order to fully realize their projects. “Data is not easy to come by for folks not inside companies,” said Gert Lanckriet, a professor in the Department of Electrical and Computer Engineering, University of California, San Diego, at an event announcing the data release.

“We hope that this data release will similarly inspire our fellow researchers, data scientists, and machine learning enthusiasts in academia, and help validate their models on an extensive, ‘real-world’ dataset,” Yahoo concluded. “We strongly believe that this dataset can become the benchmark for large-scale machine learning and recommender systems, and we look forward to hearing from the community about their applications of our data.”

Editors' Recommendations

Lulu Chang
Former Digital Trends Contributor
Fascinated by the effects of technology on human interaction, Lulu believes that if her parents can use your new app…
Inside the rapidly escalating war between deepfakes and deepfake detectors
Facebook Deepfake Challenge

Imagine a twisty-turny movie about a master criminal locked in a war of wits with the world’s greatest detective.

The criminal seeks to pull off a massive confidence trick, using expert sleight of hand and an uncanny ability to disguise himself as virtually anyone on the planet. He’s so good at what he does that he can make people believe they saw things that never actually happened.

Read more
The BigSleep A.I. is like Google Image Search for pictures that don’t exist yet
Eternity

In case you’re wondering, the picture above is "an intricate drawing of eternity." But it’s not the work of a human artist; it’s the creation of BigSleep, the latest amazing example of generative artificial intelligence (A.I.) in action.

A bit like a visual version of text-generating A.I. model GPT-3, BigSleep is capable of taking any text prompt and visualizing an image to fit the words. That could be something esoteric like eternity, or it could be a bowl of cherries, or a beautiful house (the latter of which can be seen below.) Think of it like a Google Images search -- only for pictures that have never previously existed.
How BigSleep works
“At a high level, BigSleep works by combining two neural networks: BigGAN and CLIP,” Ryan Murdock, BigSleep’s 23-year-old creator, a student studying cognitive neuroscience at the University of Utah, told Digital Trends.

Read more
Clever new A.I. system promises to train your dog while you’re away from home
finding rover facial recognition app dog face big eyes

One of the few good things about lockdown and working from home has been having more time to spend with pets. But when the world returns to normal, people are going to go back to the office, and in some cases that means leaving dogs at home for a large part of the day, hopefully with someone coming into your house to let them out at the midday point.

What if it was possible for an A.I. device, like a next-generation Amazon Echo, to give your pooch a dog-training class while you were away? That’s the basis for a project carried out by researchers at Colorado State University. Initially spotted by Chris Stokel-Walker, author of YouTubers:How YouTube Shook Up TV and Created a New Generation of Stars, and reported by New Scientist, the work involves a prototype device that’s able to give out canine commands, check to see if they’re being obeyed, and then provide a treat as a reward when they are.

Read more