Skip to main content
  1. Home
  2. Computing
  3. Emerging Tech
  4. Features

Exclusive: YouTube reveals how it can make you speak languages you don’t know 

YouTube wants your lips to play nice with AI-translated audio. It's more technical to pull off than it looks.

Add as a preferred source on Google
YouTube audio dub feature in action.
Nadeem Sarwar / Digital Trends
Promotional image for Tech For Change. Person standing on solar panel looking at sunset.
This story is part of Tech for Change: an ongoing series in which we shine a spotlight on positive uses of technology, and showcase how they're helping to make the world a better place.

It would be an understatement to say that the video content industry is currently at an inflection point. On one hand, we have AI supercharging the creative potential of content creators, but on the other side of the ocean, the problem of AI slop and misinformation lingers. The sheer potential of AI, however, can’t be ignored.

The folks over at YouTube are putting it to good use with a focus on accessibility and realism. So, what’s next? Making the lips move naturally to the tune of any language, even if the speaker in the video doesn’t speak it. Building on the auto-dubbing feature that was launched last year, the team has now come up with the new AI-powered lip sync feature. 

Recommended Videos

Machine-translated audio has improved dramatically over the past few quarters, and it now almost sounds natural. Audio overviews in Google’s NotebookLM are a great example. But when it comes to videos, they fall flat because the lip movement simply doesn’t match what the speaker is saying with a translated version of the script. 

It’s pretty jarring and off-putting. The AI-powered lip sync feature wants to overcome that audio-visual dissonance. And from the samples that I’ve seen so far, they feel uncannily natural. I sat down with YouTube Product Lead, Autodubbing, Buddhika Kottahachchi, to understand how lip sync was developed, its impact, and the road ahead. 

Digging into the technical side

In less than a year since its launch, YouTube’s auto-dubbing feature has been used to dub over 60 million videos across 20 languages. But preserving a natural tone with all the nuances of a conversational speech, and then matching it with realistic lip movements, is a whole new challenge. 

On the surface level, Kottahachchi tells me that the lip sync system “modifies the pixels on the screen to match the translated speech.” It’s a custom tech stack, the Google executive tells me, adding that they needed to develop a 3D understanding of the world, lip shapes, teeth, posture, and face. 

For now, the tech is suited for full-HD (1080), but not tuned to 4K videos, as of now. “But generally, it should work with the video resolutions that you upload,” he points out. As far as the language support goes, YouTube’s AI-powered lip sync feature supports English, Spanish, German, Portuguese, and French. 

That’s a pretty restricted pool, but Kottahachchi tells me that the team is scaling up and lip sync will eventually support the same set of languages as the auto-dubbing feature can handle (which currently stands at over 20 languages). For comparison, Meta’s AI-fueled lip sync feature for Facebook and Instagram supports only English, Spanish, Hindi, and Portuguese. 

Now, AI-powered lip syncing is not entirely an alien concept. Adobe already offers an auto lip sync functionality. Then there are third-party options such as HeyGen, which claim to do it for free. But when it comes to YouTube, we are talking about a built-in system at a massive scale on a platform where 20 million videos are uploaded on a daily basis.

The AI Babel fish for your face

So, what’s next in terms of availability? “We are not ready to make any broad statements about how broadly we will make it available, but we do want to make it available to more creators and understand the compute constraints and the quality,” Kottahachchi tells me. And that brings us to the crucial cost question. 

When I enquired about it, the YouTube executive told me that they can’t make predictions about the fee involved, if at all. That also explains why the feature is still part of a pilot project among a small pool of trusted testers to understand the market and compute costs. To recall, this is a complex vision-based implementation of AI.

So, just like AI-generated videos, where you can create a few clips for free but need to pay up for higher resolution or attempts, YouTube will have to factor in the compute costs and decide on the rollout. But from a creator’s perspective, if I am chasing a wider reach, I’d likely pay the subscription fee.

The AI dilemma

Ever since AI visuals started flooding the internet, the debate around authenticity and fair disclosure has heated up. “What is even real?” Social media users have been asking that question with more fervor soon after the uncannily realistic videos generated by OpenAI’s Sora app started popping up.

These videos have a visible watermark, but there are already free and paid tools out there that will remove the Sora label from the AI-generated clips. Or any other AI content generator, for that matter. Google, one of the biggest developers and adopters of AI, knows that all too well.

The company was one of the early leaders in the AI fingerprinting race with its SynthID system, and also launched a SynthID Detector tool earlier this year to help users check the origins of multimedia content

The YouTube videos that rely on Google’s AI-powered lip sync feature will take an even more cautious approach. “We will have a proper disclosure saying that both the audio and video in this video have been synthetically created or altered,” Kottahachchi tells me. “The video content, itself, gets fingerprinted, as well.” 

The text disclosures will appear in the description box underneath the title of YouTube videos, just as they appear for videos that have used the auto dub system. But how are other platforms going to treat the AI-dubbed, lip-synced YouTube videos if a creator posts them on Instagram or TikTok? 

Will the algorithms warm up?

TikTok recently announced that it would label videos that were “made or edited” using AI tools, and would also fingerprint them so that users can check their origins using C2PA’s Verify tool. Meta has a similar system in place. So, what’s the fate of AI-edited videos that are cross-posted to other social video platforms? 

Will they be algorithmically downranked, or blocked from appearing in certain feeds? The situation is a bit tricky and unpredictable. “It’s something we are monitoring carefully, but it’s a little early because platforms have made statements, but we haven’t seen how they are effectively implemented,” he tells me. “Generally, we are translating translations, but not new content.”

I also brought up the issue with bad actors using videos of creators without their due consent, translating the audio, and pushing them from a different channel, or platform. Auto-dubbing and AI lip syncing technically make that unscrupulous act easier to execute, but it likely won’t devolve into total chaos.

“If your likeness is being used somewhere else on the platform, you can tell us about it and ask us to take it down,” Kottahachchi told me. It would be interesting to see how auto-dubbing, expressive audio, and lip-synced videos will make the YouTube experience more diverse. On the surface, it seems like a win. 

I can’t wait to see myself speak in Spanish, though I abandoned my Duolingo streak years ago. 

Nadeem Sarwar
Nadeem is the Managing Editor at Digital Trends.
Claude Fable 5 is leaving subscriptions, but maybe not for good
High demand is pushing Claude Fable 5 out of subscriptions for now
Claude Fable 5 and Claude Mythos 5 Official Render

Anthropic’s most advanced publicly available Claude model is still leaving standard subscription access after July 7, but the company is now trying to calm fears that the move is permanent.

Fable 5 recently returned to Claude after drawing scrutiny from the U.S. government. Anthropic said it would be included on Pro, Max, Team, and select Enterprise plans for up to 50% of weekly usage limits through July 7. After that date, the model is set to move to usage-credit billing, meaning users will pay for access outside their regular plan limits.

Read more
Yet another research breaks the hype bubble for AI browsers serving serious security flaws
Four popular AI browsers can be exploited to steal your data from other open tabs.
ChatGPT Atlas browser on a MacBook.

AI browsers are being sold as the next big thing. They can summarize pages, book trips, and even make purchases for you. But a new study from the University of Washington found that four of the seven most popular ones come with a security risk serious enough to let malicious websites steal data from other sites you have open. The more capable the browser, the bigger the risk turns out to be.

The 30-year security rule that AI browsers are breaking

Read more
Valve just gave away the blueprint for its coolest Steam Machine mod
Valve giving away the recipe instead of the dish, and honestly, we're okay with it.
Valve Steam Machine Featured Design Coverplate

While Valve’s Steam Machine launched at a higher-than-expected price due to the AI-driven chip shortage, it seems that the company is not sitting on its haunches and is still working hard to make the product more enticing to users. 

One of the coolest features of the Steam Machine is the user-customizable front faceplate, and Valve has just made it better. The company open-sourced its "Inkterface" project, which allows users to build their own e-ink faceplate for the Steam Machine.

Read more