Twitter’s priories are becoming clearer. The platform has long being tip-toeing into real-time search, in a blog post today made it clear how big of a priority this is becoming and the investment the team is making in developing this feature.
Since the shelf life of a trend is just mere hours (if that), engineers have to get creative with how to recognize and surface emerging trends in its search results. With this in mind, the social network’s updated search engine will now use crowdsourced human feedback to power its results. The problem with Twitter’s search engine before this change was that it couldn’t respond fast enough. When Obama’s re-election photo was first tweeted, or when the breaking story about Osama Bin Laden’s death first hit, its algorithm wasn’t able to recognize that trending hashtags relevant to these incidents in their initial occurrence on the site.
For instance, in an example described by Twitter data scientist Edwin Chen and Senior Software Engineer Alpa Jain, the moment that Mitt Romney infamously said “binders full of women” the Internet-connected world flocked to Twitter to find out more about the phrase. Searching for anything relevant to “binders full of women” on Twitter before Romney’s remark would have displayed results about “binders” and “women.” But immediately after Romney’s flub, the meaning of “binders full of women” had instantly changed. Unfortunately Twitter’s engine isn’t able to recognize this change in context quickly enough when it originally happens.
To solve this contextual problem, Twitter engineers turned to not only machines but also to humans.
Simply put, there are three steps to how this new human-powered system works. The first step is to recognize trending topics. Twitter does this by identifying spikes in traffic around certain search keywords using its Storm system, which mines and makes sense of the deluge of information in real-time to recognize trending topics.
Now the next step is where it gets interesting: Twitter sends topics that have spiked to “human evaluators” to provide more details about the subject matter, including photos, whether it refers to a person or event, and the type of category the topic would fall under. And all this is done using Amazon’s Mechanical Turk – a crowdsource platform for programmers to call on the help of human beings to perform tasks that an algorithm wouldn’t be capable of completing. Twitter’s “evaluators,” though, the blog post says, are custom Mechanical Turk judges and not just an everyday Joe. In fact the judges are from around the world (ensuring that there’s always someone up at any hour) and paid on a full-time basis to identify these trends. These judges have even set up their own network to collaborate and discuss a task with one another. In the hands of the anonymous select few, it’s a heavy responsibility that these Twitter-employed fact finders are holding.
Finally, Twitter teaches its engine the context of the word for posterity. Now if you search for “binders full of women” or any variation of that, you’ll see tweets about politics, Obama, Romney, and of course the phrase itself. What the blog post also adds is that this human powered contextual topic identification system will help with better targeting ads. To use the same example, if you search for “binders full of women,” ads that are relevant (or remotely relevant) to politics would then show up instead of ads for back-to-school binders.
Using human beings to identify trends might sound like a step backwards, but seeing as how the technology in this area has yet to catch up, a human-powered solution is really the most efficient strategy for engineers, especially given the amount of content users are increasingly flooding Twitter with. For everyone else, now you can be rest assured that Twitter is probably now the fastest (and relatively reliable) real-time social search engine for current events.