Skip to main content

Microsoft’s new bot can draw a photo-realistic bird based on text descriptions


Microsoft’s research labs created a new artificial intelligence, or bot, that can draw any image you want based on simple descriptions. The company says this bot can draw anything in pixel form stemming from caption-like text descriptions you provide. And although text-to-image creation isn’t anything new, Microsoft’s “drawing bot” focuses on captions as image descriptors to produce an image quality that is claimed to be three times better than other state-of-the-art technologies.  

“The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes, such as grazing livestock, to the absurd, such as a floating double-decker bus,” Microsoft states. “Each image contains details that are absent from the text descriptions, indicating that this artificial intelligence contains an artificial imagination.” 

Microsoft’s drawing bot merges two components of artificial intelligence: Natural-language processing and computer vision. The research project started with a bot that could generate text captions from photos. The researchers then advanced the project to answer human-generated questions about images, such as identifying a location, the object in focus, and so on. 

But actually drawing an image is a huge step. While the bot can generate components based on text descriptors, it must “imagine” all the other missing pieces of the picture. Thus, if you tell the bot to draw a yellow bird with black wings, it has four descriptors, but must pull the remaining parts from data it acquired from previous drawings, photos, and more. In other words, knowledge obtained through machine-based learning. 

Microsoft’s bot relies on a generative adversarial network (GAN). Just imagine two teams of computers: One side must render an image to fool the other team into believing it’s an actual photograph. Both teams go back and forth, with the first saying the image is real, and the second saying “nuh-uh,” disproving the claim. The goal, obviously, is to render an image that finally fools the second team. 

In this case, the first team renders an image derived from text-based descriptions and the second team will disprove its “authenticity” as an actual photograph until the first team correctly renders the image. Microsoft first fed its GAN with paired images and captions so that it could understand that it needs to draw a bird based on that single word. 

From there, Microsoft continued to build the knowledge base with paired images and captions consisting of multiple traits, such as black wings and a red belly. But Microsoft says it’s not using just any GAN, but one that targets tiny details so the bot can produce photo-realistic results. Microsoft dubs it as an attentional GAN, or AttnGAN. 

“As humans draw, we repeatedly refer to the text and pay close attention to the words that describe the region of the image we are drawing,” the company says. “[AttnGAN] does this by breaking up the input text into individual words and matching those words to specific regions of the image.” 

You can read Microsoft’s research paper describing its AttnGAN here. 

Editors' Recommendations

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
A new A.I. can guess your personality type based on your eye movements
biometric sensors security scanners in vehicles 49818131  close up of woman eye process scanning

“The eyes … they never lie,” said noted philosopher Tony Montana in the gangster movie Scarface. While Montana chose to go down the drug-dealing and murdering route, however, had he been born 30 years later he could probably have had a promising career as a computer interface designer. At least, that's the message we’re choosing to take away from a new project created by researchers in Australia and Germany. They developed an artificial intelligence that is able to predict a person’s personality type by looking into their eyes.

“Several previous works suggested that the way in which we move our eyes is modulated by who we are -- by our personality,” Andreas Bulling, a professor from Germany’s Max Planck Institute for Informatics, told Digital Trends. “For example, studies reporting relationships between personality traits and eye movements suggest that people with similar traits tend to move their eyes in similar ways. Optimists, for example, spend less time inspecting negative emotional stimuli -- [such as] skin cancer images -- than pessimists. Individuals high in openness spend a longer time fixating and dwelling on locations when watching abstract animations.”

Read more
New algorithm helps turn low-resolution images into detailed photos, ‘CSI’-style
algorithm low res high fromsmallton

Anyone who has ever worked with image files knows that, unlike the fictional world of shows like CSI, there’s no easy way to take a low-resolution image and magically transform it into a high-resolution picture using some fancy “enhance” tool. Fortunately, some brilliant computer scientists at the Max Planck Institute for Intelligent Systems in Germany are working on the problem -- and they’ve come up with a pretty nifty algorithm to address it.

What they have developed is a tool called EnhanceNet-PAT, which uses artificial intelligence to create high-definition versions of low-res images. While the solution is not a miracle fix, it does produce a noticeably better result than previous attempts, thanks to some smart machine-learning algorithms.

Read more
AI-powered photo app Microsoft Pix can now turn photos into paintings, too
microsoft pix adds style transfers blog multiple 1 secondary art 637x478

Artificial intelligence powers Microsoft Pix’s ability to shoot better photos, from choosing the best shot using a pre-burst to automatically adjusting settings. Now, that same neural network is allowing users to turn their shots into stylized art. On June 15, Microsoft announced updates to Pix that give the photo app Prisma-like painterly effects.

The app update gives the photo editor the ability to apply style transfers. Unlike the traditional photo filter, style transfers actually alter the structure of the image by applying texture and pattern adjustments along with color changes. The update brings 11 new transfer styles from rendering fire into photos to creating pointillism-like art.

Read more