Smarter search: Why ‘semantic search’ will finally let Google understand you


The Wall Street Journal’s Amir Efrati has raised eyebrows with an article (subscription required) saying Google is working to stay ahead of its rivals in Internet search by introducing more so-called “semantic search” technology. The idea is the Google’s search box wouldn’t just be a place for users to type keywords or specifically-formed queries, but a box that had an actual understanding of many of the terms, names, verbs, and references people type in — and could apply that knowledge to users’ searches. In theory, semantic search should be able to return results that reflect a searcher’s intent, and in some cases improve Google’s ability to give an answer right away without referring users off to another site.

But wait — is this anything new? Doesn’t Google already put some answers right up front? And how could semantic search potentially help Google maintain its lead in the Internet search business?

What is semantic search?

In a nutshell, semantic has much more in common with Watson, the IBM supercomputing application that handily defeated humans at Jeopardy! than it does with the Find dialog in Microsoft Word.

Loosely speaking, the world of computerized searching breaks down into two types:

Literal search (sometimes called navigational search) looks for exact matches for some or all of the terms entered, and returns matching items — whether files, Web pages, products, or some other discrete unit of information. Literal search can be augmented with things like stem-matching, conjugates, and association that expand or restrict the search in useful ways — so searching for “fly” might also hit “flight.” Literal search is what we’re most familiar with today, in part because it’s the easiest for computers to perform.

Semantic search differs from literal search in two ways. First, semantic search tries to understand what a user is asking in a query by placing it in context through analysis of the query’s terms and language. This analysis is conducted against tightly pre-compiled pools of knowledge, potentially including knowledge about the user. Second, instead of returning a set of files, Web pages, products, or other items, semantic search tries to provide a direct answer to a question. If you ask a semantic search engine “When was Pluto discovered?” it might answer “Pluto was discovered on February 18, 1930 by Clyde Tombaugh*,” where a literal search engine would most likely return links to Web pages that contain the words “discovered” and “Pluto.”

It turns out literal search and semantic search are good for different tasks. Literal search is great when a user is looking for a specific thing, whether that be a file, Web page, document, product, album, or other discrete item. Semantic search, on the other hand, turns out to be more useful when a user is looking for specific information — like a date, number, time, place, or name.

Thanks in part to the proliferation of literal search technology in everything from word processors to Web search engines, we’re most accustomed to literal search. Most of us already know how to manipulate literal search to get us closer to what we want on the first try. However, according to Efrati’s WSJ article, Google believes semantic search technology could provide direct answers to between 10 and 20 percent of Web searches. According to Comscore, Google handled 11.7 billion searches in the United States alone in February 2012. With semantic search capabilities, more than 2.3 billion of those searches could have been answered directly, instead of sending people off to other Web pages and sites.

Doesn’t Google already do this?

If you’ve used Google Web search at all you’re probably thinking “But wait, Google already does this!” Type “current time in Tokyo” or “how tall is Mount Everest” and Google will put its best guess at a precise answer at the top of its search results. Google even cites sources for its response, and some of those sources will be in the classic “ten blue links” below the answer. (Google reports Mount Everest is 8,848 meters tall, by the way.)

To be fair, this is one of just many useful capabilities Google has built into its search bar: It’ll do (sophisticated) math, perform unit and currency conversions, and pull up things like flight information and local movie show times — no need to type out a complicated query. It can also tap into some public data sources. For instance, typing “population Mexico” into the search box will display data from the World Bank. The response today is 113,423,047 people.

However, Google’s efforts to provide direct answers to some types of questions falls down pretty quickly, because those features are largely implemented as special cases to Google’s literal search engine, rather than as a semantic search that tries to understand what the user wants. Type “how tall is mt everest” (note the spelling) into the search box, and Google doesn’t even attempt to provide an answer: Google search doesn’t know “mt” means “mount.” Similarly, if Google has determined your current location is not in Mexico (and, if Google doesn’t have your location, it’ll guess by your IP address and, no, you can’t opt out) searching for “population mexico city” might return some unexpected results. Surely Mexico City is home to more than 10,852 people, right?

