With every answer, search reshapes our worldview

six metrics of search futureofsearch 06

If you asked Google “Did the Holocaust happen?” earlier this month, the search engine first directed you to Stormfront, a white supremacist organization that denies the genocide ever occurred.

After initially defending its algorithm, Google eventually tweaked it to bury the offending result. But the enduring lesson is a lot more complicated than a single errant search. Every time search engines change how they measure results, they change our ideas about how the world fits together – or at least the world as reflected in the information that stands in for it.

Precision or recall? Choose one

It was the late 1980s when I first encountered full-text search technology. I was working at an early electronic publishing software company — Interleaf — whose clients needed to be able to search the large document sets they were creating. The available search engines varied, but the metrics were the same. In addition to the obvious speed-per-page metric for returning results, there were the paired terms “precision” and “recall.” Precision quantifies how many false positives show up in the results list: If you search for conference papers about Oingo Boingo, does the engine also give you reviews of their albums and photos of their recordings sessions? Worse, does it give you conference papers about Bingo and Bongos? There goes your precision!

Recall, on the other hand, measures many appropriate pages your search did not find: Are there a hundred conference papers about Oingo Boingo in the index that do not show up anywhere in the results list? That’s going to drive down the engine’s recall score.

The common wisdom was that if you optimized for one metric, you’d pay a price with the other: If you wanted to get every reference, you’d have to put up with some false positives, and if you wanted no false positives you wouldn’t be able to get every reference. This very real constraint showed us an information space — the set of searchable terms and their relationships — characterized by imprecision and unreliability because the space consisted of text that was created without any care for its findability when stirred into a cauldron with thousands of other texts. Our search engines tried to impose structure and find relationships using mainly unintentional clues. You therefore couldn’t rely on them to find everything that would be of help, and not because the information space was too large. Rather, it was because the space was created by us slovenly humans.

The rise of relevancy

By the mid to late 1990s, precision and recall were not enough. When you have millions and then billions of pages in your index, finding every instance of a term doesn’t much matter because users won’t know or care that your list of 100,000 hits really should have 100,001 entries. Likewise, precision becomes less important, so long as the false positives don’t show up in the first few pages of results, because, frankly, hardly anyone gets past those pages.

When 100,000 pages are relevant, you need to provide another way of sorting.

Instead, an existing term took on a new importance: relevancy. When a search finds 100,000 results, the user needs to be shown the most useful hits first. Of course, what’s useful depends on what the user is trying to do. And that makes judging relevancy a dark art practiced by mages and wizards. That modern Web search engines have gotten so good at it is a testament to the skill of their developers … as well as to how much data search engines have gathered about us.

When relevancy reigns, the information space shows itself as super-abundant and ambiguous. But the ambiguity has a different source than in the old precision-and-recall days, for it results not from the vice of slovenliness but from the collective human virtue of overloading language with rich and inextricably linked meanings. The information space is made up not of language-as-information but of language-as-poetry, that is, of words that are enriched by their layers of meanings and permeable borders.

But is it interesting?

As the body of the Web began to scale up from the millions to the billions and now conceivably to the trillions of pages — That’s a lot of typing, people! Good job! — relevancy began to suffer from the same problems of scale as precision and recall. When 100,000 pages are relevant, you need to provide another way of sorting. For example, after the photo-sharing site Flickr had been entrusted with its first couple of billion photos, it let us sort on interestingness. If you search for, say, “keys” at Flickr and sort by relevancy, you’ll see a collection of photos of keys. But if you sort by interestingness, you’ll see striking photos, many of which are not as clearly relevant to the search terms: sunset over Key Bridge, or a supporting column on a bridge with a gap that resembles a keyhole.

Flickr gauges interestingness by looking at a variety of factors, many of which are metrics of the community’s reaction to the photos. While Flickr doesn’t spell out its exact algorithms, a photo is judged to be more interesting if it gets lots of clicks and links especially from people who are not in the photographer’s social circle, if it gets printed out more frequently, if lots of people leave comments, and so forth. The result is a set of photos that the community likes, with a lowered requirement for relevancy.

If you’re simply trying to illustrate what a lock and key look like, sort by relevancy. If you’re using locks and keys as a metaphor for a false sense of security or for the dangers of a police state, go for interestingness.

Back in the 1980s, interestingness wouldn’t have been as useful because there wasn’t today’s super-abundance of content. But once we’re confident that we can find what’s relevant, we also want to find what is striking in its expression, or that is relevant in a non-literal meaning. We’ve always wanted that. We just didn’t know it because we couldn’t have it. Search engines in this way have become instruments of our non-literal use of language — a use that is more essential to language than mere literalism is.

Digging deeper

Now we’re hearing a cry for a fifth metric of searching. Serendipitous results are meant to do the opposite of what traditional searches have done, for they show you what you were not asking to see. But mere surprise isn’t enough. If you search for “key” and are shown pages about clown makeup or elephant toe nails, the serendipity is unlikely to be useful because these have absolutely nothing to do with your search terms. To be usefully serendipitous, there should be a subtle but meaningful relationship. Perhaps you’ll be shown a page about Harry Houdini, or a paper about biological models of DNA based on lock and key relationships, or a feminist history of chastity belts. Serendipity requires an extended sense of relevancy that hits the mark between too literal and too far afield.

Serendipity turns noise into signal: Results that had been filtered out now get filtered in.

Search engines can deliver on serendipity now because they have more semantic information — information about meanings — to play with. Much of that information is being assembled into graphs in which relationships among ideas are analyzed and represented in terms of distances or degrees, as in six degrees of Kevin Bacon. Or we could just look lower in the relevancy stack, although that will give less meaningful results.

Serendipity turns noise into signal: Results that had been filtered out now get filtered in. Serendipity signals that we rejoice in living in a world that we cannot fully know.

The future of search

More recently, especially because of the prominence of fake news, there’s been an increasing demand for two additional ways of searching.

The first is for serendipitous results to counteract the effect of echo chambers and filter bubbles that only show us what we already believe. If this is to be effective, it will require yet another type of search result: not just serendipity but results that are just different enough from what we believe that we’ll read them, understand them, and perhaps be nudged by them. Librarians are often superb at making this sort of suggestion, but machine learning could also get good enough at the task.

“Just different enough” searches would reveal our information space as being a more human space than ever. If the early search engines revealed us humans as slovenly, disorganized scavengers lacking the discipline that information management requires, and if in the ages of interestingness and serendipity we looked like contributors to an undisciplined super-abundance of meaningful connections, the call for content that pierces our echo chambers recognizes that our new information space does not consist only of information. Rather, it reflects the biases and evil inclinations of human thought, from outright racism and sexism to the subtler ways that privilege distorts our views.

That has led is to a sort criterion that has been surprisingly lacking: truth, or what Google quite reasonably referred to as quality when it lowered the ranking of Holocaust-denial sites. Sorting by truth-quality has shown up late because when full-text indexing began, the information space had been manually curated. When the Web began to scale up, search engines like Google optimistically assumed that analyzing the Web community’s use of pages — particularly, the network of links — would be a sufficient guide to quality. But, thanks to expert gamers of the system and the possibility that the Crowd isn’t as Wise as we’d hoped, filtering by truth-quality yields more reliable results than ranking by usage.

And so our information space is being revealed not as a jumble of words and phrases sorted by algorithms but as an instrument of power that reflects our biases, assumptions, ambitions, and blindnesses. Like the Internet itself, bit by bit, search technology is revealing us in our fullness.


Apple Maps boosts Flyover locations, indoor mall maps, and more

In a boost for Apple Maps, the tech company has recently added more than 50 new locations for Flyover, the feature that offers spectacular 3D photo views of particular cities and famous landmarks around the world.

Turn to these apps to help you in your next hunt for a job

Looking for a job can be a stressful experience, but these days, a simple mobile app can help you to find and apply for jobs all over the country -- here are some of the best job search apps for iOS and Android.
Home Theater

New TV? Here's where to go to watch the best 4K content available

Searching for content for your new 4K UHD TV? Look no further. We have every major source of the best 4K content, along with the cost, hardware requirements, and features that make each service worth a look.

These point-and-shoot cameras make your smartphone pics look like cave paintings

If your smartphone camera just isn't giving you the results you're looking for, maybe it's time to step up your game. The latest and greatest point-and-shoot cameras offer large sensors, tough bodies, and long lenses -- something no phone…

How to share an external hard drive between Mac and Windows

Compatibility issues between Microsoft Windows and Apple MacOS may have diminished sharply over the years, but that doesn't mean they've completely disappeared. Here's how to make an external drive work between both operating systems.

Make a GIF of your favorite YouTube video with these great tools

Making a GIF from a YouTube video is easier today than ever, but choosing the right tool for the job isn't always so simple. In this guide, we'll teach you how to make a GIF from a YouTube video with our two favorite online tools.
Smart Home

Booth babes, banned sex toys, and other mishaps at CES 2019

From female sex toys bans, to fake Tesla/robot collision stories, there was some weird stuff going on at CES 2019 this year. Here are some of the biggest mishaps and flubs at the world's biggest tech show.

Google has found a clever way to make your search history more useful

Google has found a clever way to make more use of your search history by showing links to pages you've visited before. Ideal for repeat searches for the same page, the links show up on cards at the top of mobile search results.

Shutdown makes dozens of .gov websites insecure due to expired TLS certificates

The US government shutdown is causing trouble in internet security. As the shutdown enters day 22, dozens of government websites have been rendered insecure or inaccessible due to expired transport layer security (TLS) certificates.

Our favorite Chrome themes add some much-needed pizzazz to your boring browser

Sometimes you just want Chrome to show a little personality and ditch the grayscale for something a little more lively. Lucky for you, we've sorted through the Chrome Web Store to find best Chrome themes available.
Social Media

A quick swipe will soon let you keep bingeing YouTube on mobile devices

The YouTube mobile app has a new, faster way to browse: Swiping. Once the update rolls out, users can swipe to go to the next (or previous) video in the recommended list, even while viewing in full screen.

Switch up your Reddit routine with these interesting, inspiring, and zany subs

So you've just joined the wonderful world of Reddit and want to explore it. With so many subreddits, however, navigating the "front page of the internet" can be daunting. Here are some of the best subreddits to get you started.

Cathay Pacific messes up first-class ticket prices — again

A couple of weeks ago, an error on Cathay Pacific's website resulted in first-class seats selling for a tenth of the price. On Sunday, January 13, the airline made the error again. The good news is that it'll honor the bookings.

Reluctant to give your email address away? Here's how to make a disposable one

Want to sign up for a service without the risk of flooding your inbox with copious amounts of spam and unwanted email? You might want to consider using disposable email addresses via one of these handy services.