Inside Knowledge Graph: Google’s deep-diving semantic search

Google Knowledge Graph

Google is starting to roll out its new Knowledge Graph technology to its English-speaking users in the United States. Although the new service will be popping up as an adjunct to Google’s normal Web search results — rather than a separate service in its own right — it represents a fundamentally different way to approaching search. Instead of returning ranked search results based on literal search terms (or some search terms, or possibly-corrected versions of some of search terms), Knowledge Graph essentially attempts to associate search queries with stuff it knows about: places, people, books, movies, events — you name it. Knowledge Graph is an effort to achieve semantic search, attempting to return results based on the meaning of what users search for, instead of just literal matches.

Can the Knowledge Graph change the way we search? And what might it mean for Google’s fundamental business — and sites that rely on Google to bring traffic to their sites?

Knowledge Graph under the hood

Google Knowledge Graph (Curie)

Although Knowledge Graph is a fundamentally new kind of search offering from Google, it follows well-trodden paths Google has been pursuing for years with its mainstream search service. And Google is being careful to introduce it in a way that isn’t terribly disruptive to its market-dominating search.

For years, Google has been able to answer a selection of simple factual queries directly from the search bar, and even do some math — handy for people who are more likely to have a Web browser running than a calculator. Try it: Google should provide direct answers to things like “capital of suriname” or “square root 3952.”

With Knowledge Graph, Google will also be dropping search queries into complex databases of interrelated information about…well, things, for lack of a better terms. In some ways these databases function much like a traditional lookup: they return records with important bits of information about a particular thing. For a person, that might be something like their birth date (and maybe death date), their nationalities, titles or offices they may have held, full legal name, and more.

For a building, these datasets might include things like its location, when it was built, its overall size, its type (say, monument, retail space, commercial space, residence, um…space station?). However, in addition to what amount to a few bare facts and some keywords, these database entries also collect together direct links to related objects in the database (which in turn link to other related objects, and so on). In all probability, the nature of those links are defined too. For instance, an entry around a person might contain links to that person’s parents, spouse(s), and children, and other significant relationships and be able to distinguish between family members and other types of relationships. The database wouldn’t be doing its job if an dataset on George H. W. Bush (the 41st President of the United States) didn’t link to dataset on George W. Bush (the 43rd President) — and both would link to Condoleezza Rice, but in different ways. A dataset on the Great Pyramid should include links to Cheops and Khufu, and The Sphinx — but also to the Mausoleum at Halicarnassus. (Can you guess why?)

These datasets make up the heart of semantic search — and they don’t come cheap. First of all, they’re huge: The sum of human knowledge may be but a tiny speck in the face of all the information in the universe, but just scraping the service can easy produce hundreds of millions (or billions) of datasets. (In comparison, the English version of Wikipedia has a scant 4 million or so articles.) These datasets aren’t easy to get: they have to be painstakingly compiled from reliable sources. Furthermore, they have to be organized and designed in such a way that the information can be accessed and manipulated in useful ways (and in real time, for Google’s purposes). And the datasets have to be able to cope with the maleable nature of “knowledge.” After all, just a few years ago, Pluto was a planet and Vioxx was an FDA-approved osteoarthritus treatment.

Google is apparently building its databases using technologies and methods acquired with Metaweb back in 2010 — although Metaweb’s Freebase semantic database remains available to anyone. Google is using Freebase for data, along with information culled from Wikipedia and the CIA World Factbook. Google claims its Knowledge Graph database already has entries for some 500 million objects (please note thee objects can’t be directly compared to Wikipedia articles) and some 3.5 billion “facts.” We put “fact” in quotes because it was once a “fact” that the Earth was flat and humans couldn’t fly. Knowledge is slippery.

Knowledge Graph on the screen

Google’s initial implementation of Knowledge Graph is designed to augment the company’s existing search results listings, rather than replace them. Much as Google sometimes shows previews of pages in a panel to the right side of search results in a standard Web browser window, Knowledge Graph results will appear in panels next to search results. Not all search terms will produce Knowledge Graph panels: Queries will have to match well-defined objects in the Knowledge Graph. (Don’t worry if you don’t see Knowledge Graph results just yet; Google is still rolling the feature out, and right now it’s limited to English-speaking users in the United States.)

The Knowledge Graph panels seek to display a summary of key and most-sought information about a query without requiring users to read through two-line summaries of a Web page or click through to another site. For a person, these key facts might include birth and death dates, significant people associated with them, and quick highlights of titles, accomplishments, or what else makes that person significant. For other entities, Google will try to surface key information, statistics, and associations. The Knowledge Graph panel will also handle disambiguation. If more than one Knowledge Graph entity matches a search query, Google provides access to them all.

Perhaps more significantly, once users are interacting with a Knowledge Graph entity they can, within some limits, surf the links of relationships to those entities. For instance, pulling up a Knowledge Graph entry on Dashiell Hammett ought to let users immediately jump to a Knowledge Graph summary of The Thin Man and The Maltese Falcon — and, perhaps to summaries about Lillian Helman and post-World War II anti-Communist witch hunts.

Knowledge Graph won’t be restricted to browser-based searches: Google is currently rolling out Knowledge Graph search results to most devices running Android 2.2 or higher (again, U.S.-only in English) in the Quick Search box and browser-based searchers. Knowledge Graph search results will also be introduced to forthcoming versions of Google’s search app for iOS devices. Users can navigate though information in Knowledge Graph by tapping or swiping back and forth through the content.

Google Knowledge Graph (mobile)

It’s important to note these are just the first places Knowledge Graph is surfacing in Google’s services. Behind the scenes, you can expect Knowledge Graph search results to begin informing a broad variety of Google services, particularly as its corpus of datasets and “facts” grows. Knowledge Graph searches will likely never replace Google’s traditional keyword-based search — semantic search and literal search are kind of two different tools good at two separate tasks — but, in theory, it wouldn’t be surprising if Knowledge Graph one day contributed to as much as a quarter of Google’s interactions with search users.

Crowdsourcing…or Google-colored classes?

So, how does Knowledge Graph pick information for its summaries? So far, Google hasn’t been very explicit about the methodology behind Knowledge Graph’s presentation. In my (limited) sampling, a good portion of the data Google prioritizes for its summaries seems to be pretty consistent: dates, relations, and a single “significant accomplishment” field for people (which could be labeled something like “Discoveries” or “Occupation” or “Title”). Places get locations and dates, and a selection of other fields that could be exactly what someone wants or completely inappropriate. For instance, if you’re looking at The Empire State Building, providing the street address seems appropriate…but it’s not quite as appropriate for, say, Stonehenge. Similar oddities can happen with phone numbers: how many people need instant access to a phone number for the Taj Mahal?

Google Knowledge Graph (Taj Mahal)

Google says it prioritizes the information it presents in Knowledge Graph summaries using “human wisdom.” And by that, Google doesn’t actually mean things that humans tell them or that subject experts or database curators collect — it means making indirect assumptions about users’ intentions by logging search behaviors and keeping tabs on what they click, don’t click, and look for after doing a search. In a nutshell, Google is using crowdsourcing to try to determine which “facts” are the best ones to present in a Knowledge Graph summary.

For example, Google says the Knowledge Graph summary information it presents for Tom Cruise answers 37 percent of Google search users’ follow-up queries about the actor when they search for him. That 37 percent number sounds re-assuringly scientific and precise, but there is absolutely no way to assess whether Google’s assessment of search users’ aggregate behavior has anything to do with what a particular user — like you — wants to know. Since Google seems so proud of that 37 percent figure, let’s turn it on its head: Google says 63 percent of the time, it can’t present any information about a topic that its search users find relevant.

Google’s position is easy to understand: Whenever possible, it wants to immediately present the information its users are seeking. The only way Google can really assess that is by looking at how people use its search engine and trying to do some guesswork.

Crowdsourcing has its dangers. Just as Google is treading in murky waters when it chooses to prioritize search results from Google+ in Search Plus Your World, there are hazards to relying on crowdsourcing to prioritize the presentation of information and “facts.” Just because Google’s search audience may not know (or particularly care) about certain information doesn’t mean it’s not important or relevant. There are plenty of cases where “the crowd’s” perception of facts are wrong. Most people think schizophrenia means having multiple personalities, drinking milk or eating ice cream increases mucus production, and Marie Antoinette said “Let them eat cake.” Yet none of these things are true.

Relying on crowdsourcing to assess the important of information also creates potential for abuse. Say a government wanted to seed misinformation about dissidents, a political campaign wanted to smear an opponent, or hackers wanted to play with search results just for laughs? In much the same way Google search results have been “Googlebombed,” crowdsourcing could be used to manipulate Knowledge Graph. Sensible people won’t believe everything they read; similarly, “facts” presented by semantic search engines will not be reliable — and in some cases crowdsourcing will make them even less so.

Making Google stickier

On the practical side, Google’s Knowledge Graph will have one immediate impact: It will make Google’s search results stickier. Whenever Knowledge Graph can provide a direct answer to a search user’s question — or let them navigate to it quickly via related topics — users will be staying on Google services. That means Google collects more data about users’ searches and behaviors (regardless of whether they’re signed in to a Google account or not). That, in turn, lets Google further refine its targeted advertising platform.

It also means that services like Wikipedia that often answer the same sorts of knowledge-specific queries targeted by Knowledge Graph will see a decline in the amount of Web traffic they receive from Google. In Wikipedia’s case, that directly corresponds to fewer opportunities to solicit community support; for other services, that will translate directly to a lower number of ad impressions and (hence) lower revenues. For folks who offer sites and services based on providing discrete facts and information — and that includes everything from Wikipedia to IMDb to online retailers to phone books and business directories to (conceivably) crowd-sourced services like Yelp and even public records…Knowledge Graph could slowly erode their businesses.


Microsoft could split up search and Cortana in the next Windows 10 release

In the latest Insider preview build, Microsoft is exploring ways to split up Cortana and search on Windows 10. If Microsoft moves ahead with this change, we could see separate search and Cortana options in the Spring 2019 Update.

No more wild goose chase: ‘’ now leads to DuckDuckGo instead of Google

DuckDuckGo recently acquired a shorter domain name from fellow search engine competitor Google. As a result, longtime and new DuckDuckGo users can now access the privacy-focused search engine by going to

Looking for flexible and inexpensive phone service? Check out our favorite MVNOs

Looking to switch from a major carrier to something a little more affordable? Luckily, there are a ton of great MVNO options to choose from. Check out our guide to the best MVNOs, from Boost Mobile to Google Fi.
Home Theater

Step aside set-top boxes, the best streaming sticks are tiny and just as powerful

Which streaming stick reigns supreme? We pit the Chromecast and Chromecast Ultra against the Roku Premiere, Roku Streaming Stick+, and the Amazon Fire TV Stick 4K to help you decide which one will be the best fit in your living room.

How to connect AirPods to your MacBook

If you have new AirPods, you may be looking forward to pairing them with your MacBook. Our guide will show you exactly how to connect AirPods to MacBook, what to do if they are already paired with a device, and more.

Hitting ‘Check for updates’ in Windows 10 opts you into beta releases

Users who are careful about keeping their system updated should watch out -- Microsoft revealed this week that clicking the Check for updates button in Windows can opt you in to testing beta code.
Emerging Tech

Awesome Tech You Can’t Buy Yet: Booze-filled ski poles and crypto piggy banks

Check out our roundup of the best new crowdfunding projects and product announcements that hit the web this week. You may not be able to buy this stuff yet, but it sure is fun to gawk!
Product Review

The Asus ZenBook 14 is a tiny notebook that gets lost in the crowd

The ZenBook 14 aims to be the smallest 14-inch notebook around, and it succeeds thanks to some tiny bezels. Performance and battery life are good, but the notebook lacks a standout feature other than size.

Secure your Excel documents with a password by following these quick steps

Excel documents are used by people and businesses all over the world. Given how often they contain sensitive information, it makes sense to keep them from the wrong eyes. Thankfully, it's easy to secure them with a password.

Which Macs are compatible with MacOS Mojave?

Is your computer ready for Apple's big Mojave update? Here's what you need to know about MacOS Mojave compatibility, what Macs can successful download Mojave, and the requirements you need to know about.

Change your mouse cursor in Windows with these quick tips

The standard mouse cursor is boring, so change it! With this guide on how to change your mouse cursor in Windows, you can choose to use one of Microsoft's pre-installed cursors or download something a bit more extravagant.

The DualShock 4 is one of the best controllers ever, and you can use it with a PC

Sony's new DualShock 4 controller has become a fan favorite, and some people want to use it with a PC. Here's how to connect your DualShock 4 and start using it, either with an official adapter, or unofficial software.

MacBook Pro battery replacement: Everything you need to know

Looking for a new battery for your MacBook Pro? It's important you know what to look for, what model you have, and what options Apple gives you! We'll cover everything you need to know about Apple MacBook Pro battery replacement.

Lost your router? Here's how to find its IP address to help track it down

Changing the login information for your router isn't always easy, that's why so many have that little card on the back. But in order to use it, you need to know where to go. Here's how to find the IP address of your router.