Skip to main content

Scientists develop program that matches doodles with photographs

As FindFace has shown, image recognition has made incredible strides over the past few years. Despite this, artificial intelligence still struggles with certain types of images, such as drawings and doodles. A team of scientists is looking to change that with a new program that could match doodles and sketches to similar photographs.

Scientists believe the solution to bridging this gap is deep learning, a type of artificial intelligence that uses neural networks to analyze and process images. What are neural networks, you ask? Neural networks are brain-like computer systems designed to pick out and create patterns based on inputs.

Recommended Videos

In an effort to get around the current limitation of image recognition techniques, scientists have started work on a program that will be able to take a doodle or sketch of an image and find a matching photography – a visual search engine, if you will.

One of the main reasons doodles have stumped image recognition programs in the past is that many times, the drawings aren’t perfect representations. Rather than being lifelike recordings of what it is we’re doodling, sketches and drawings are merely how we visualize something.

These less-than-accurate representations, be it on purpose or due to lack of drawing skills, often leads to dramatized proportions and missing elements that might not prevent humans from recognizing the image, but causes problems for computers. For example, that stick figure you drew in third grade likely wouldn’t be recognized as a human by an image recognition program, as its body is nothing more than a few lines placed below a circle – not the lifelike proportions of the human form.

While still in its infancy, New Scientists notes the program has already proven viable. Using more than 600 individuals, scientists displayed a random image for only two seconds and asked the participant to sketch the pictured object from memory. These resulting sketches were put into one neural network while the images shown to the participants were put into another. The two networks then worked together to analyze the sketches and match them to their original picture.

In the initial testing phase, the program was able to match the doodles up to their original image 37 percent of the time. Seemingly unimpressive, it’s important to note that humans only matched the sketches and images correctly 54 percent of the time. While not conclusive, these preliminary numbers show that with a little tweaking and more inputs, the program could very will surpass humans capabilities to match sketches to photos.

Where would this technology be useful though? The possibilities are limitless, really. Looking at it from a consumer standpoint, this tech could be used to create an app that lets you search for images and artwork based on doodles of your own. Imagine being able to draw up a little sketch, scan it with your phone, and find similar artwork to be inspired by or to compare your own work to.

From a more commercial standpoint, one of the most useful features of this technology would be to help police identify criminals by comparing sketch artists renderings to mugshot databases.

More thorough findings and information on the system is set to be released at the SIGGRAPH conference in Anaheim, California, in July.

Gannon Burgett
Former Editor
DeepSeek: everything you need to know about the AI that dethroned ChatGPT
robot hand in point space

A year-old startup out of China is taking the AI industry by storm after releasing a chatbot which rivals the performance of ChatGPT while using a fraction of the power, cooling, and training expense of what OpenAI, Google, and Anthropic's systems demand. Here's everything you need to know about Deepseek's V3 and R1 models and why the company could fundamentally upend America's AI ambitions.
What is DeepSeek?
DeepSeek (technically, "Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd.") is a Chinese AI startup that was originally founded as an AI lab for its parent company, High-Flyer, in April, 2023. That May, DeepSeek was spun off into its own company (with High-Flyer remaining on as an investor) and also released its DeepSeek-V2 model. V2 offered performance on par with other leading Chinese AI firms, such as ByteDance, Tencent, and Baidu, but at a much lower operating cost.

The company followed up with the release of V3 in December 2024. V3 is a 671 billion-parameter model that reportedly took less than 2 months to train. What's more, according to a recent analysis from Jeffries, DeepSeek's “training cost of only US$5.6m (assuming $2/H800 hour rental cost). That is less than 10% of the cost of Meta’s Llama.” That's a tiny fraction of the hundreds of millions to billions of dollars that US firms like Google, Microsoft, xAI, and OpenAI have spent training their models.

Read more
I’ve experienced the next era of AI, and I’m never going back
Launching Gemini Deep Research query on Chrome desktop.

Ever since ChatGPT arrived on the scene, the hype around AI has only intensified. As talk of Artificial general intelligence (AGI) and “superintelligence” — yeah, OpenAI chief, Sam Altman, is now talking about that — heats up, we have another buzzword to deal with.

Say hello to Agentic AI. In simpler terms, AI agents that are supposed to automate a chunk of our digital chores, things like Custom GPTs by OpenAI.

Read more
OpenAI opens up developer access to the full o1 reasoning model
The openAI o1 logo

On the ninth day of OpenAI's holiday press blitz, the company announced that it is releasing the full version of its o1 reasoning model to select developers through the company's API. Until Tuesday's news, devs could only access the less-capable o1-preview model.

According to the company, the full o1 model will begin rolling out to folks in OpenAI's "Tier 5" developer category. Those are users that have had an account for more than a month and who spend at least $1,000 with the company. The new service is especially pricey for users (on account of the added compute resources o1 requires), costing $15 for every (roughly) 750,000 words analyzed and $60 for every (roughly) 750,000 words generated by the model. That's three to four times the cost of performing the same tasks with GPT-4o.

Read more