Facebook and Search Help Researchers Use Us as Guinea Pigs

Microsoft lab opening — Image used with permission by copyright holder

Vast amounts of data and insane computational power are transforming what we know about how people interact with each other – but beyond that, social scientists have no idea what the hell is going on.

That’s the mad state of modern sociology, according to an array of researchers speaking Monday during an open house at Microsoft Research’s New York offices. While disciplines such as anthropology, communication, media studies, psychology, and sociology have seen markedly little improvement over the past hundred years, experts in these fields now see social networks like Facebook and Twitter as their great hope.

“It’s possible to look at the interactions of a billion people these days — we call it Facebook.”

Duncan Watts, a principal researcher at Microsoft Research, explained sociology’s stunted growth simply: Observing the interactions of hundreds of millions of people isn’t easy, and doing experiments on that scale was simply inconceivable. Until a few years ago, that is.

“It’s possible to look at the interactions of a billion people these days – we call it Facebook,” Watts said. “We have the potential for a revolution … not just in society but in our ability to study society.”

He likened it to the particle collider – a heckuva comparison to be sure, but for a research field that hasn’t been tweaked in a century, the advent of enormous databases that researchers can sift through and do experiments on in real time is a game-changer. Field research in many of these disciplines previously involved months or years of work. Databases popularly cited include the U.S. Census and Gallop Polls.

Today, Watts said researchers have “virtual labs” that allow them to test a hypothesis in days or even hours, using a hundred thousand or even a million test subjects. And it’s enabling all sorts of neat new studies.

Take for example a recent finding by Principal Researcher Kate Crawford, who studies crisis informatics (“they’re paying us by the buzzword,” joked her colleague Justin Rao). She studied Twitter following Hurricane Sandy to see how people’s relationship to privacy information changes during a crisis.

“Yes, at the aggregate level there is a marked change,” she said, teasing the conclusion to a forthcoming research paper. “Simply put, when people are suffering, the amount of effort they’re going to put into protecting their data is much lower.”

Others are looking at how disease spreads, what leads to moral corruption, what makes us happy. But for a society wary of NSA prying and companies aggregating their lives for commercial purposes, the potential for researchers peering at their dirty laundry and drawing conclusions is scary.

After all, most people can’t even do basic probabilities; how are they supposed to understand that researchers aren’t piling on and further eroding their privacy? Social scientists say they’re aware of the issue.

“A lot of the data [scientists] are collecting is being increasingly interfered with by the companies that collect it.”

“We, as a scientific community, don’t even have a way of giving people even an inkling of what we’re talking about. We have a challenge as a community, and if we don’t work to get it right, we’re all guilty of something we don’t want to be guilty of,” said Dan Huttenlocher, dean and vice provost of Cornell Tech.

There’s an extra ingredient in the complex stew: Companies like Facebook and Bing aren’t creating the ideal raw data stream you might think they are.

“A lot of the data [scientists] are collecting is being increasingly interfered with by the companies that collect it,” Watts said, a problem he termed algorithmic conflict. Think of a Facebook post you make that the company only shares with a select group of people it thinks will be most likely to like it, rather than the pool of followers at large.

A 2012 study from Microsoft of Xbox user’s voting plans underscores the problem, failing to predict results. David Rothshchild, an economist with Microsoft wearing the requisite bow tie, compared it to an election-prediction fail from Literary Digest in 1936. Back then, the magazine’s flawed polling methods skewed toward Americans with high incomes, botching its predictions and eventually causing the magazine to fold. Likewise, Microsoft’s polling of Xbox users skewed toward younger gamers, largely male, and not the population at large.

Microsoft lab opening 3 — Image used with permission by copyright holder

“There’s been a lot of changes in the last 75 years, including the Internet and computers,” he joked. Yet traditional polling hasn’t changed, and we haven’t overcome issues we faced way back then.

Rao and Rothschild proposed yet a new type of database, which Rao called “medium data” rather than big data or small data. (He may or may not have been serious about the idea; Rao joked before beginning that “we didn’t have anything planned, so here goes.”)

Medium data would take the best parts of big data and little data, to get the best of both worlds: Giant data sets that can be mined for interrelations quickly and accurately. It could be the solution all of these social scientists are hoping for; or it could be a big joke.

“And again, you can forget this all after the talk if you want,” he concluded.

Social science is clearly being shaken up, likely in a good way, by the advent of machine learning and the vast data analysis it brings with it. It’s a long-overdue improvement. And one that clearly requires work.

“I really think this is a challenge we owe ourselves to spend some time thinking about,” Huttenlocher.

How Facebook, Twitter, and Bing are giving researchers the perfect guinea pig (you)

Editors' Recommendations