Skip to main content

'Stylized Facial Animation' project turns video into realistic-looking hand-drawn art

Example-Based Synthesis of Stylized Facial Animations (results)
In November of 2016, we spoke with Adobe about
Recommended Videos
Project StyLit, a research project that used artificial intelligence to apply an artistic style to a three-dimensional model. It was an intriguing technology, but it had many limitations and an apparently limited usefulness. Building off of what the team learned with Project StyLit, Adobe Research has now taken a big step forward with a new project that, for now, is known simply as “Stylized Facial Animation.” It can apply an input style (or “style exemplar”) to video with very natural results, doing a much better job of imitating hand-drawn animation than other techniques. While it is currently limited to faces, it shows promise for how animators will one day be able to automate the rotoscoping process, or at least aspects of it.

The technology will be publicly demonstrated on August 3 at SIGGRAPH, but Digital Trends had the chance to speak with Adobe’s David Simons about the project ahead of the presentation. Simons holds the role of Senior Principal Scientist within Adobe’s Advanced Product Development Group and is a 25-year veteran of After Effects, Adobe’s special effects program for video. After Effects seems a likely candidate for where the technology demonstrated in Stylized Facial Animation could one day end up. At this time, Adobe has not announced any plans for a commercial release of the tech, but it is the mission of Simons and the Advanced Product Development Group to turn research projects into marketable products, so it may not be that far off from finding its way into a future Creative Cloud update (it’s highly likely that it will be demonstrated at Adobe’s upcoming MAX conference).

Simons opened by explaining some key differences between Project StyLit and Stylized Facial Animation. StyLit relied on the information channels available in 3D models in order to appropriately map the style to the model, and styles were based off of simple “sphere study” sketches that clearly laid out tonality and color. “The idea was to make similar stylized synthesis, driven by your choice of style, but be able to apply it to a photograph or video,” Simons told Digital Trends.

In this case, rather than relying on sketches of spheres, the style exemplars are portraits of actual faces, and the qualities of the style — textures, lines, brush strokes — are appropriately redrawn in the shape of the subject’s face in the target video. In the resulting output animation, the human subject is clearly recognizable, yet appears to be hand drawn in the technique used in the style exemplar, even when the exemplar face is a different shape, age, or gender from the person in the video.

Neural networks to the rescue

As detailed in a research paper, Stylized Facial Animation is based on previous research on neural network computing, conducted by both Adobe and others. One of the earliest and most popular consumer applications of neural network technology was the Prisma app, which applied an artistic style to an image and, later, to video. An important distinction between the likes of Prisma and simple image filters is that apps based on neural networks actually redraw the image from scratch, rather than overlay the style on top of the photograph. But while Prisma could do this very quickly, it did a relatively poor job of maintaining details present in the source style.

While it still struggles in some specific areas, like forehead wrinkles, Adobe’s approach significantly increases the amount of artistic detail that makes it through to the final output. It accomplishes this by analyzing both the style exemplar and the target video, creating “guiding channels” that ensure the style is appropriately applied to the target. One channel is used to align the position of the exemplar and the target, another maintains the shading present in the target subject, and a third matches semantically similar regions (eyes to eyes, hair to hair, etc.).

The underlying technology is actually based on the Content-Aware Fill feature in Adobe Photoshop. “It looks at nearby areas and takes patches that fit together. It’s like trying to fill in a jigsaw puzzle with pieces from the rest of the area that match up,” Simons said.

Example-Based Synthesis of Stylized Facial Animations

The result is, essentially, just how you would expect such a technology to work, but it is considerably different from what we’ve seen in apps like Prisma as it maintains so much more detail from the source style. But when it comes to applying the effect to multiple frames for an animation, things get even trickier. With Project StyLit, animations could only be created by applying a style one frame at a time, then manually connecting the frames in Adobe Premiere Pro or another video-editing application.

“A lot of it is about avoiding the uncanny valley. You want a level of abstraction.”

One of the key components of Stylized Facial Animation is that it can apply a style to an entire video at once, but this can lead to the issue of temporal coherence. This is somewhat ironic, as temporal coherence is typically what such programs aim for — that is to say, where each frame logically matches up with the previous frame, leading to smooth motion — but this actually has the effect of making the resulting animation look less like it was done by hand. Hand drawn animations have a natural amount of decoherence between frames, as the pencil lines or brush strokes won’t perfectly match from one frame to the next. Existing methods inherently achieved very high temporal coherence, so Adobe researchers knew they had to develop their own way to reduce coherence in Stylized Facial Animation in order to get more natural results.

“We want it to look redrawn on every frame,” Simons explained, “But not so flickery that you can’t see what’s going on. A lot of it is about avoiding the uncanny valley. You want a level of abstraction.”

The solution came in allowing control over “temporal noise” by using a separate guiding channel to determine coherence. As each frame is synthesized from the preceding one, controlling the amount of blur in a frame’s temporal guiding channel will raise or lower coherence in the next frame, making it possible to find a balance between an animation that is too choppy and too smooth.

Applying the technology

While Simons was quick to reaffirm that the project is definitely “still research at this time,” there are two possible applications for it in the future. One would be for animating live-action footage, as demonstrated in the research paper, and the other would be for creating puppets to use in Adobe’s Character Animator, which automatically animates a character based on video of an actor.

Any portrait — a sketch, painting, or even a photograph — can be used to style the video.

Compared to Project StyLit, another advantage with Stylized Facial Animation is that any portrait can be used as a style exemplar. This doesn’t just give artists greater control over creating their own styles compared to a basic sphere study, but it also means non-artists can grab source images from virtually anywhere and use them as style exemplars. The sources need not even be drawings or paintings; photographs (even of statues) will also work.
Image used with permission by copyright holder

Going forward, one of the main goals will be getting to the point where the program can run in real time, or at least close to it. Currently, it takes about 30 seconds just to compute the guiding channels and three minutes to perform full synthesis of a single, 1-megapixel frame using a quad-core processor running at 3 gigahertz, according to the research paper. However, the team has already demonstrated vast performance improvements possible by using a graphics processing unit (GPU), which reduces the time required for single-frame synthesis down to just 5 seconds. Given that most video runs at 24 or 30 frames per second, this is still a ways off from achieving real time synthesis, but the project is also still in very early development.

Whatever the eventual outcome of Stylized Facial Animation, it is clear that Adobe is very interested in continuing to develop the technology. The paper’s lead author, Jakub Fiser, started out as an intern at Adobe. This project was his Ph.D. thesis, and was enough for Adobe to bring him on full time. Fiser joins an experienced group of individuals, of whom Simons spoke highly, even while keeping modest with regard to his own role. “Everyone else is a Ph.D.,” he said. “They’re the real researchers.”

Topics
Daven Mathies
Former Digital Trends Contributor
Daven is a contributing writer to the photography section. He has been with Digital Trends since 2016 and has been writing…
Best drone deals: Get a cheap drone for $47 and more
The DJI Mini 3 Pro in flight with spring flowers in the background.

You don't have to be a YouTuber or Twitch streamer to find a lot of use for drones, especially if you're the sort of person who enjoys photography and filmography. Even better, a lot of modern drones, especially the ones targeted to consumers, have a lot of automation in them, so you don't need to be incredibly skilled in drone flight to use one. Of course, drones can still be quite expensive, especially if you want something that's a step above the basic budget-oriented drones. That's why we've collected some of our favorite drone deals, including some DJI alternatives, so you can find what works best for you.

Of course, if you prefer a more traditional experience, you could always check out these GoPro deals and camera deals instead.
Radclo Mini Drone -- $50, was $230

Read more
Astronaut’s stunning photo shows ‘flowing silver snakes’
A photo of Earth at night taken by NASA astronaut Don Pettit.

Over his three previous missions to the International Space Station (ISS), NASA astronaut Don Pettit earned a reputation for having a keen eye when it comes to photographing Earth and beyond.

Since arriving at the ISS on his fourth orbital mission earlier this month, Pettit, who at 69 is NASA’s oldest active astronaut, has wasted little time in grabbing the station’s cameras to capture and share fresh dazzling imagery shot from 250 miles above Earth.

Read more
SpaceX recreates iconic New York City photo with Starship workers
SpaceX engineers high above the company's Starbase facility in Boca Chica, Texas.

SpaceX has given a shout-out to some of its engineers as the company prepares for its first attempt at "catching" a first-stage Super Heavy booster as it returns to Earth.

In a message accompanying two images that recreate the iconic Lunch Atop a Skyscraper photo taken in New York City in 1932, SpaceX said on X (formerly Twitter) that the engineers have spent “years” preparing for the booster catch, a feat that it’s planning to try for the first time with the upcoming fifth test flight of the Starship. It also included a photo of how the first-stage Super Heavy booster will look when clasped between the tower’s giant mechanical arms after launching the upper-stage Starship spacecraft to orbit.

Read more