MIT's Pic2Recipe A.I. Can Predict Food Ingredients By Analyzing a Photo

Scrolling through food photography can bring on the desire to recreate a dish at home, but what if the ingredients aren’t listed? Could there be a way to find out just by analyzing the image? That’s what researchers at the Massachusetts Institute of Technology asked when they set out to create a deep learning algorithm that could predict a recipe based just on a photo. The research, published on July 20, resulted in a program called Pic2Recipe that could accurately predict a dish’s recipe based on a photo, with a 65 percent success rate.

Earlier attempts to turn photos into recipes were limited by smaller datasets — although “small” is relative to all the possible recipes available. One study used 65,000 recipes, but it only included traditional Chinese cuisine; another only had about a 50 percent accuracy in initial testing. Because deep learning algorithms “learn” from being fed large quantities of data, these resulting programs were missing large gaps in potential ingredients, affecting the program’s accuracy.

To create a larger database, the researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) knew the software would have to be based on a wide-ranging set of data. So to solve that narrow dataset, the team turned to large sets of photos and recipes that already exists — food websites. Compiling data from places like Food.com and All Recipes, the team created Recipe1M, a dataset of over one million recipes.

Using those recipes and the associated images, the team was able to train the software to use object recognition to pick up on what each dish’s ingredients might be. With a list of ingredients, the system then selected  the recipe that best matched the list. Pic2Recipe was able to recognize ingredients like flour, eggs, and butter.

The program doesn’t actually identify a recipe from the photo — it creates a list of ingredients. With that list, the program can then go through that one-million-recipe database and choose the one with ingredients that match the list from the photo.

“In computer vision, food is mostly neglected because we don’t have the large-scale datasets needed to make predictions,” said Yusuf Aytar, a postdoctoral associate who co-wrote the paper with MIT professor Antonio Torralba. “But seemingly useless photos on social media can actually provide valuable insight into healthy habits and dietary preferences.”

Since the computer already has that large dataset, it is also able to able to pick up on a number of different patterns, like that the average recipe has nine ingredients and the most popular are salt, butter, sugar, olive oil, water, eggs, garlic cloves, milk, flour, and onion.

The software could have a number of different real-world uses. A person could snap a photo at a restaurant to learn how to make the dish at home, or to track her personal nutrition.

The program, while it contains a wider dataset than earlier attempts, still has a few gaps. The researchers said the program has trouble with dishes that are a bit more ambiguous, like smoothies and sushi rolls. Similar recipes with a number of different variations, like lasagna for example, also tended to confuse the program.

The group plans to continue developing the program and even hopes to give the system the ability to tell how something is cooked, like picking up the difference between stewed and diced. Future work could also expand the program’s ability to recognize specific ingredients, like determining the type of onion instead of just listing onion.

You don’t have to wait until Pic2Recipe becomes a full fledged app to try it out. An online version allows users to upload images and try it out.