The problems with your socially-sourced election predictions

social media and election predictions

Finally, November 6 is here. Election Day has come and that means the campaigns we’ve watched unfold over what feels like a lifetime are coming to an end. And this has been an election season the likes of which we’ve never seen: It’s infiltrated every last pore of our existence thanks to the deluge of social media outreach and coverage.

And it’s not over yet. While the votes have yet to be counted, we apparently already have an outcome: Obama will be our president for another four years.

How do we know this? Social media. Statistically, candidates who have more Facebook Likes and Twitter followers tend to win. Three recent gubernatorial races went to the candidate with largest Twitter follower counts.

But wait – there’s also an argument that @mentions indicate who our winner is going to be. The candidate everyone’s talking about will be the next president. In that case, Mitt Romney takes the lead.

topsy election

And if you think Web buzz in general is going to predict our next president, then once again it’s Romney for the win.

google trends election

Social network-sourced election predictions are everywhere, but whether they mean anything is unclear. There’s no shortage of infographics using eye-catching visuals to colorfully chart out how we’re using Facebook, Twitter, and Instagram to talk about our candidates, but not enough analysis about whether this content is signal or noise.

What we do know is this:

  • People increasingly use social networks; as of May, eight percent of online adults were using Twitter on a typical day.
  • People increasingly get their news from social media and digital outlets; according to a Pew Research Center Report, “the percentage of Americans saying they saw news or news headlines on a social networking site [from the day before] has doubled from nine percent to 19 percent – since 2010.”
  • 88 percent of social media users are registered voters.

election infographicsThe big picture take-away from this is that social media matters in an election; politicians should be creating content here, because constituents react to it. But the data raises more questions than it answers. What about sarcasm, mood, and sentiment? Or spam accounts? Or malicious posts meant to manipulate a social network? We just saw how completely faked information can take over a platform as well as the news cycle, so what’s to keep us safe from similar effects with election predictions?

For all the revealing bite-sized data we can squeeze out of Facebook posts and followers and Twitter mentions and hashtags, there are plenty of glaring inaccuracies as well. According to Twitter, Ron Paul should have been the Republican nominee, and the Iowa primary would have ended quite differently.

For every insightful social media analysis, there are 10 terribly shallow ones. Countless news outlets launch tools or republish infographics that count “buzz” – and if that’s all we we’re looking for, Herman Cain would probably be our next president.

What’s missing, of course, is context. So many of the endlessly-circulated infographics and charts we’ve seen this year are heavy on data and pie charts and graphs and lack anything to put those numbers into perspective.

As a rule, these prediction tools aren’t able to determine if tweets or Facebook posts are coming from people who are even eligible to vote. According to a recent study, 73.7 percent of Twitter users are between the ages of 15 and 25, so there’s plenty of opportunity here for users who can’t vote to be sharing their opinions. To further complicate things, only 0.45 percent of Twitter users disclose their ages. 46 percent of Facebook users are 45 or older, but given how large Facebook is, this number is sort of insignificant; it means that there are plenty of under-aged users on the site who could very well be endorsing candidates they can’t vote for.

Demographics in general are rather ignored – the most notable of them being that the people who use social networks to talk about politics are already politically motivated. This means that those who keep their political ideals to themselves or don’t have strong enough beliefs to broadcast them – but will still vote – aren’t being counted. In traditional prediction methods, pollsters specifically reached out to this subset in order to create more accurate views of the elections, but since the data being pulled from social networks is all self-motivated, this simply gets left out.

German researchers recently looked at Twitter’s predictive abilities, saying: “While we find evidence of a lively political debate on Twitter, this discussion is still dominated by a small number of users: Only four percent of all users accounted for more than 40 percent of the messages.” This was back in 2009, but surely the scale still speaks to a notable discrepancy.

“So long as you understand the data and the demographics of social users, you can measure sentiment in perspective and target relevant messages to those groups,” says 140Proof founder and CTO John Manoogian III. “The problem comes when pundits and analysts mistakenly assume that social media aggregates the tastes and preferences of all voters — it does not. Sites like Facebook and Twitter have political leanings. But social media does pretty strongly indicate what young, educated, and/or professional voters are saying.”

A much bigger factor is the failure of sentiment and mood analysis. The mechanisms we use to measure social media talk aren’t sophisticated enough to know if you’re slamming Obama or endorsing him. Some people can barely read sarcasm, so don’t expect applications pulling in tweets and posts to consistently do so. If you write “Oh sure, I totally believe Romney has women’s interests at heart,” in a Facebook post, it will likely be interpreted as a pro-Mitt statement.  

There’s also the fact that we want this to work; we want to be able to use social media to tell us how we all really feel. “This is the so-called file-drawer effect: researchers do not publish negative results, hence, only positive results are known and it seems that something is possible most of the time when it could be due to pure chance,” says researcher Daniel Gayo-Avello. “A paper [I wrote with] Eni Mustafaraj and Takis Metaxas showed that, in fact, results could be guessed roughly half of the time; that is not an impressive predictive ability, isn’t it?”

“Thus, by not knowing (because they are seldom published) about negative results, everybody assumes that simple methods simply work; and, indeed, they ‘work’ but half of the time, akin to a broken watch which ‘works’ twice a day.”

This doesn’t mean all this information isn’t valuable – it absolutely is, and interesting to boot. There’s no argument that politicians should be using social networks, and this type of outreach is only going to become more common and more sophisticated.

The big lesson is social networking can be used as an indicator of opinion, but not necessarily decision. There are just too many loose ends involved, and until they can be tied up, you’re just going to have to learn who will win our elections the old fashioned way: by waiting up all night for someone on TV to announce it. 

Editors' Recommendations