Shutterstock Reverse Image Search Yields Greater Accuracy

Shutterstock’s visual search engine could make browsing photos less of a chore

By Les Shu March 14, 2016

Perhaps somewhat ironically, we are used to searching for photos through text. Looking for a photo of a cat? Just type in “cat” in the Google image search bar, and it will return relevant photos (provided they were tagged as such, of course). Keywords will do the job at the most basic level, but what if you are looking for a specific type of photo? You could type “yellow cat” or some sort of generic description, but things become difficult as the description becomes more complex.

To address this, photo agency Shutterstock has just launched a new tool, called Reverse Image Search, that allows customers to upload a photo (up to 5MB) and find images that are similar. Using computer vision, Shutterstock says the tool breaks through the limiting ceiling of metadata.

Besides keywords, “the technology now relies instead on pixel data within images,” wrote Kevin Lester, who is Shutterstock’s vice president of engineering, in a blog post. “It has studied our 70 million images and 4 million video clips, broken them down into their principal features, and now recognizes what’s inside each and every image, including shapes, colors, and the smallest of details; this visual and conceptual data is represented numerically.”

Although this kind of computer vision-based search technology has existed for years (Google’s image search lets you do the same thing), when compared with similar tools offered by other stock photo services, Shutterstock says its technology is the most refined.

“It isn’t the first, but it’s the best on the market,” says Lawrence Lazare, Shutterstock’s product director for search and discovery.

The big benefits of using computer vision are accuracy and speed, and for Shutterstock’s customer base, it solves a major problem with search. It cuts down on the amount of time spent on searching for an image. If you are looking for inspiration or something generic, metadata (keywords) is easier, Lazare says. But if you are creating something specific — an ad campaign, for example — and you have specific requirements for what needs to be in that photo, then words aren’t as successful.

“Words are fallible — some pictures are hard to describe,” Lazare adds. “Some photos would require a short story to describe, and people don’t search like that.”

For example, typing in “sunset” into the search bar will result in 14,394 pages encompassing 1,439,383 photos, illustrations, and vector art that depict a sunset. And the photos are dependent on whether the photographer added keywords properly (sometimes a photographer will use a bunch of keywords to tag a batch of photos, say a wedding, but then may include photos that aren’t related).

The image on the left shows varied results when searching by keywords. The image on the right shows visually similar results when using visual search. Image used with permission by copyright holder

You could narrow down the search results by adding additional keywords, like “city and architecture,” but, as it turns out, you’ll still have 140,330 options to browse through. It’s even more difficult when the photo in your head has nuances like the angle of a building or the color of a sunset.

Which is why visual similarity is more useful than keyword similarity, Lazare says, but this type of search requires a significant amount of machine learning, and it is not an easy task. When an image is uploaded, the computer breaks it down numerically — in a manner that it can understand — so that it can compare and contrast the important aspects of the image. The computer has to compare it against the millions of photos in Shutterstock’s archive, and do so incredibly quickly; it takes less than 20 milliseconds for the algorithms to compare and contrast 70 million images in real time. For the computer, some photos are easier to decipher, but when you have things like abstract art or colors, it’s a bit harder, and the computer is more likely to return “false positives.”

To achieve its success rates, the neural network utilized by Shutterstock’s computers required a lot of training. At the beginning, the first attempts weren’t good, but over time, the responses — reflecting the learning they were doing on their own — improved. Lester, who oversees search as well as the computer vision team, told us that in about a year’s time, the company managed to go from having nothing to having something that works well.

From our own experiments (the feature is live, and anyone can try it out by uploading an image), we can say the visual search tool is pretty good. Although it has trouble with complicated photos, it’s more successful with simpler ones. But Shutterstock, of course, isn’t the only company to develop a visual search engine: We noticed equally good results via Google’s image search, and many of Shutterstock’s competitors offer visual search as well (although Shutterstock showed us similar technology from competitors, and claims they aren’t as successful, hence one reason why they decided to build it from scratch).

We uploaded a fairly complicated photo, and threw the computer off. However, it does recognize that it's some type of architecture. — We uploaded a fairly complicated photo, and threw the computer off. However, it does recognize that it’s some type of architecture. Image used with permission by copyright holder

This all shows just how far along computer vision and machine learning have come in a relatively short time. And it’s only going to get better: Shutterstock is adding new tools to its network that would allow users to give its computers feedback about the quality of the search results, and will soon unveil visual search for its four million video footage assets, which is an even greater challenge than static photos.

Editors' Recommendations