Skip to main content

With every answer, search reshapes our worldview

six metrics of search futureofsearch 06
Image used with permission by copyright holder

If you asked Google “Did the Holocaust happen?” earlier this month, the search engine first directed you to Stormfront, a white supremacist organization that denies the genocide ever occurred.

After initially defending its algorithm, Google eventually tweaked it to bury the offending result. But the enduring lesson is a lot more complicated than a single errant search. Every time search engines change how they measure results, they change our ideas about how the world fits together – or at least the world as reflected in the information that stands in for it.

Precision or recall? Choose one

It was the late 1980s when I first encountered full-text search technology. I was working at an early electronic publishing software company — Interleaf — whose clients needed to be able to search the large document sets they were creating. The available search engines varied, but the metrics were the same. In addition to the obvious speed-per-page metric for returning results, there were the paired terms “precision” and “recall.” Precision quantifies how many false positives show up in the results list: If you search for conference papers about Oingo Boingo, does the engine also give you reviews of their albums and photos of their recordings sessions? Worse, does it give you conference papers about Bingo and Bongos? There goes your precision!

Recall, on the other hand, measures many appropriate pages your search did not find: Are there a hundred conference papers about Oingo Boingo in the index that do not show up anywhere in the results list? That’s going to drive down the engine’s recall score.

The common wisdom was that if you optimized for one metric, you’d pay a price with the other: If you wanted to get every reference, you’d have to put up with some false positives, and if you wanted no false positives you wouldn’t be able to get every reference. This very real constraint showed us an information space — the set of searchable terms and their relationships — characterized by imprecision and unreliability because the space consisted of text that was created without any care for its findability when stirred into a cauldron with thousands of other texts. Our search engines tried to impose structure and find relationships using mainly unintentional clues. You therefore couldn’t rely on them to find everything that would be of help, and not because the information space was too large. Rather, it was because the space was created by us slovenly humans.

The rise of relevancy

By the mid to late 1990s, precision and recall were not enough. When you have millions and then billions of pages in your index, finding every instance of a term doesn’t much matter because users won’t know or care that your list of 100,000 hits really should have 100,001 entries. Likewise, precision becomes less important, so long as the false positives don’t show up in the first few pages of results, because, frankly, hardly anyone gets past those pages.

When 100,000 pages are relevant, you need to provide another way of sorting.

Instead, an existing term took on a new importance: relevancy. When a search finds 100,000 results, the user needs to be shown the most useful hits first. Of course, what’s useful depends on what the user is trying to do. And that makes judging relevancy a dark art practiced by mages and wizards. That modern Web search engines have gotten so good at it is a testament to the skill of their developers … as well as to how much data search engines have gathered about us.

When relevancy reigns, the information space shows itself as super-abundant and ambiguous. But the ambiguity has a different source than in the old precision-and-recall days, for it results not from the vice of slovenliness but from the collective human virtue of overloading language with rich and inextricably linked meanings. The information space is made up not of language-as-information but of language-as-poetry, that is, of words that are enriched by their layers of meanings and permeable borders.

But is it interesting?

As the body of the Web began to scale up from the millions to the billions and now conceivably to the trillions of pages — That’s a lot of typing, people! Good job! — relevancy began to suffer from the same problems of scale as precision and recall. When 100,000 pages are relevant, you need to provide another way of sorting. For example, after the photo-sharing site Flickr had been entrusted with its first couple of billion photos, it let us sort on interestingness. If you search for, say, “keys” at Flickr and sort by relevancy, you’ll see a collection of photos of keys. But if you sort by interestingness, you’ll see striking photos, many of which are not as clearly relevant to the search terms: sunset over Key Bridge, or a supporting column on a bridge with a gap that resembles a keyhole.

Flickr gauges interestingness by looking at a variety of factors, many of which are metrics of the community’s reaction to the photos. While Flickr doesn’t spell out its exact algorithms, a photo is judged to be more interesting if it gets lots of clicks and links especially from people who are not in the photographer’s social circle, if it gets printed out more frequently, if lots of people leave comments, and so forth. The result is a set of photos that the community likes, with a lowered requirement for relevancy.

If you’re simply trying to illustrate what a lock and key look like, sort by relevancy. If you’re using locks and keys as a metaphor for a false sense of security or for the dangers of a police state, go for interestingness.

Back in the 1980s, interestingness wouldn’t have been as useful because there wasn’t today’s super-abundance of content. But once we’re confident that we can find what’s relevant, we also want to find what is striking in its expression, or that is relevant in a non-literal meaning. We’ve always wanted that. We just didn’t know it because we couldn’t have it. Search engines in this way have become instruments of our non-literal use of language — a use that is more essential to language than mere literalism is.

Digging deeper

Now we’re hearing a cry for a fifth metric of searching. Serendipitous results are meant to do the opposite of what traditional searches have done, for they show you what you were not asking to see. But mere surprise isn’t enough. If you search for “key” and are shown pages about clown makeup or elephant toe nails, the serendipity is unlikely to be useful because these have absolutely nothing to do with your search terms. To be usefully serendipitous, there should be a subtle but meaningful relationship. Perhaps you’ll be shown a page about Harry Houdini, or a paper about biological models of DNA based on lock and key relationships, or a feminist history of chastity belts. Serendipity requires an extended sense of relevancy that hits the mark between too literal and too far afield.

Serendipity turns noise into signal: Results that had been filtered out now get filtered in.

Search engines can deliver on serendipity now because they have more semantic information — information about meanings — to play with. Much of that information is being assembled into graphs in which relationships among ideas are analyzed and represented in terms of distances or degrees, as in six degrees of Kevin Bacon. Or we could just look lower in the relevancy stack, although that will give less meaningful results.

Serendipity turns noise into signal: Results that had been filtered out now get filtered in. Serendipity signals that we rejoice in living in a world that we cannot fully know.

The future of search

More recently, especially because of the prominence of fake news, there’s been an increasing demand for two additional ways of searching.

The first is for serendipitous results to counteract the effect of echo chambers and filter bubbles that only show us what we already believe. If this is to be effective, it will require yet another type of search result: not just serendipity but results that are just different enough from what we believe that we’ll read them, understand them, and perhaps be nudged by them. Librarians are often superb at making this sort of suggestion, but machine learning could also get good enough at the task.

“Just different enough” searches would reveal our information space as being a more human space than ever. If the early search engines revealed us humans as slovenly, disorganized scavengers lacking the discipline that information management requires, and if in the ages of interestingness and serendipity we looked like contributors to an undisciplined super-abundance of meaningful connections, the call for content that pierces our echo chambers recognizes that our new information space does not consist only of information. Rather, it reflects the biases and evil inclinations of human thought, from outright racism and sexism to the subtler ways that privilege distorts our views.

The Future of Google, Search, and Ethics | Josh Bachynski | TEDxOmagh

That has led is to a sort criterion that has been surprisingly lacking: truth, or what Google quite reasonably referred to as quality when it lowered the ranking of Holocaust-denial sites. Sorting by truth-quality has shown up late because when full-text indexing began, the information space had been manually curated. When the Web began to scale up, search engines like Google optimistically assumed that analyzing the Web community’s use of pages — particularly, the network of links — would be a sufficient guide to quality. But, thanks to expert gamers of the system and the possibility that the Crowd isn’t as Wise as we’d hoped, filtering by truth-quality yields more reliable results than ranking by usage.

And so our information space is being revealed not as a jumble of words and phrases sorted by algorithms but as an instrument of power that reflects our biases, assumptions, ambitions, and blindnesses. Like the Internet itself, bit by bit, search technology is revealing us in our fullness.

Editors' Recommendations

David Weinberger
Dr. Weinberger is a senior researcher at the Berkman Center. He has been a philosophy professor, journalist, strategic…
How to find archived emails in Gmail and return them to your inbox
A smartphone sitting on a wooden table, showing the Gmail app's inbox on its screen.

If you’re looking to clean up your Gmail inbox, but you don’t want to delete anything permanently, then choosing the archive option is your best bet. Whenever you archive an email, it is removed from your inbox folder while still remaining accessible. Here’s how to access any emails you have archived previously, as well as how to move such messages back to your regular inbox for fast access.

Read more
Samsung Spring Sale: Save on monitors, phones, TVs, and more
Samsung Galaxy S24 Ultra in Titanium Gray with S Pen on back.

Samsung, one of the most trusted brands in the electronics industry, has rolled out monitor deals, phone deals, TV deals, and price cuts for other types of devices in the Samsung Spring Sale. While it's going to run until March 10, it's highly recommended that you finish your shopping as soon as you can because for some of the popular offers, there's a chance that their stocks run out quickly. To help you make a quick decision, we've highlighted our favorite bargains below, but feel free to look at everything that's available in the ongoing sale -- just do it fast to make sure that you don't miss out on the savings.

What to buy in the Samsung Spring Sale

Read more
Is there a Walmart Plus free trial? Get a month of free delivery
Walmart logo.

Take a moment and think about how often you shop at your local Walmart. Is it weekly? Daily? If either of those is the case, it might be time to upgrade your shopping experience. The Walmart Plus free trial is your chance to check out what the retail giant has to offer. Walmart Plus is basically Amazon Prime for Walmart. You get free shipping on most orders, early access to deals and new product drops (like PS5 restocks), the best grocery delivery, and more. If Walmart is your go-to option for the best smart home devices or the best tech products in general, you should get a membership. If you want to test out the service, you can sign up for a free trial. We have all the information you need right here.
Is there a Walmart Plus free trial?
There is a Walmart Plus free trial available, and it’s one of the best free trials we’ve seen in terms of how many great features and conveniences you’re able to access. This is really a reflection of how great the Walmart Plus service is, as the Walmart Plus free trial is essentially a 30-day experience of what it would be like to be a paid Walmart Plus subscriber. A Walmart Plus membership can help you save over $1,300 per year, so taking advantage of the 30-day free trial is a great way to get in there and see what those savings will look like. And if grocery delivery is what you're really after, an alternative you might consider is the Instacart free trial -- they have more than one program to try!

As part of a Walmart Plus free trial, you’ll get free shipping with no minimum order, so even small orders will qualify for free shipping. You’ll get fresh groceries and more with no delivery fees, and all at the same low in-store prices Walmart shoppers are used to. Walmart Plus members, and Walmart Plus free trial members, get exclusive access to special promotions and events, as well as a savings of up to 10 cents per gallon on fuel. A new addition to the perks of being a Walmart Plus member is free access to Paramount Plus, a top-notch streaming service with more than 40,000 TV episodes and movies. All of this is accessible for 30 days through a Walmart Plus free trial, and once those 30 days are up, Walmart Plus is just $8.17 per month or $98 annually.

Read more