Skip to main content

Microsoft’s new bot can draw a photo-realistic bird based on text descriptions

Microsoft
Image used with permission by copyright holder

Microsoft’s research labs created a new artificial intelligence, or bot, that can draw any image you want based on simple descriptions. The company says this bot can draw anything in pixel form stemming from caption-like text descriptions you provide. And although text-to-image creation isn’t anything new, Microsoft’s “drawing bot” focuses on captions as image descriptors to produce an image quality that is claimed to be three times better than other state-of-the-art technologies.  

“The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes, such as grazing livestock, to the absurd, such as a floating double-decker bus,” Microsoft states. “Each image contains details that are absent from the text descriptions, indicating that this artificial intelligence contains an artificial imagination.” 

Microsoft’s drawing bot merges two components of artificial intelligence: Natural-language processing and computer vision. The research project started with a bot that could generate text captions from photos. The researchers then advanced the project to answer human-generated questions about images, such as identifying a location, the object in focus, and so on. 

But actually drawing an image is a huge step. While the bot can generate components based on text descriptors, it must “imagine” all the other missing pieces of the picture. Thus, if you tell the bot to draw a yellow bird with black wings, it has four descriptors, but must pull the remaining parts from data it acquired from previous drawings, photos, and more. In other words, knowledge obtained through machine-based learning. 

Microsoft’s bot relies on a generative adversarial network (GAN). Just imagine two teams of computers: One side must render an image to fool the other team into believing it’s an actual photograph. Both teams go back and forth, with the first saying the image is real, and the second saying “nuh-uh,” disproving the claim. The goal, obviously, is to render an image that finally fools the second team. 

In this case, the first team renders an image derived from text-based descriptions and the second team will disprove its “authenticity” as an actual photograph until the first team correctly renders the image. Microsoft first fed its GAN with paired images and captions so that it could understand that it needs to draw a bird based on that single word. 

From there, Microsoft continued to build the knowledge base with paired images and captions consisting of multiple traits, such as black wings and a red belly. But Microsoft says it’s not using just any GAN, but one that targets tiny details so the bot can produce photo-realistic results. Microsoft dubs it as an attentional GAN, or AttnGAN. 

“As humans draw, we repeatedly refer to the text and pay close attention to the words that describe the region of the image we are drawing,” the company says. “[AttnGAN] does this by breaking up the input text into individual words and matching those words to specific regions of the image.” 

You can read Microsoft’s research paper describing its AttnGAN here. 

Editors' Recommendations

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
How to check how much RAM you have on Windows and Mac
RAM installed in slots.

You can only know if you have enough RAM, if you know how to check how much you have. Fortunately, doing so is super quick and easy and then you can decide whether you want to upgrade your memory -- here's how to choose new memory sticks -- or whether you have enough for what you need to do.

You certainly don't need to buy more or new RAM just for the sake of it, and if you have enough for what you need, more memory won't make much difference anyway.

Read more
The real reason so many laptops have moved to soldered RAM
The Intel 12th-gen Mainboard upgrade for the Framework Laptop.

The completely redesigned Dell XPS 14 and 16 came out this year as two of the most divisive laptops in recent memory. No, it wasn't just the capacitive touch buttons or invisible trackpad that caused an uproar -- it also moved to soldered RAM. This was a big change from the past, where the XPS 15 and 17 were both celebrated for their upgradability.

Of course, Dell isn't the first to make the transition. In fact, they're one of the last, which is what makes the decision so much tougher to swallow. Where soldered RAM was previously limited to just MacBooks and ultrabooks, it's now affecting most high-performance laptops for gaming as well. Even the fantastic ROG Zephyrus G14 moved to soldered memory this year.

Read more
How to check the storage space on your Mac
The About This Mac window showing storage usage, alongside a window offering suggestions on how to save storage spce in MacOS Monterey.

Upgrading storage on your Mac isn't always easy, or even possible, so knowing how much storage space you have, and how to free up more, is a great idea. Often when you buy a Mac, that's the storage you're stuck with -- although external drives and cloud storage are always an option.

Luckily, checking your available storage -- and then freeing up space for the things you want to keep -- is very easy to do. In this guide, we’ll walk you through the process of checking your Mac’s storage space, then show you a few quick ways of clearing out the junk you no longer need.

Read more