Skip to main content
  1. Home
  2. Computing
  3. News

Google’s new AI generates audio soundtracks from pixels

Add as a preferred source on Google
An AI generated wolf howling
Google Deep Mind

Deep Mind showed off the latest results from its generative AI video-to-audio research on Tuesday. It’s a novel system that combines what it sees on-screen with the user’s written prompt to create synced audio soundscapes for a given video clip.

The V2A AI can be paired with vide -generation models like Veo, Deep Mind’s generative audio team wrote in a blog post, and can create soundtracks, sound effects, and even dialogue for the on-screen action. What’s more, Deep Mind claims that its new system can generate “an unlimited number of soundtracks for any video input” by tuning the model with positive and negative prompts that encourage or discourage the use of a particular sound, respectively.

V2A Cars

The system works by first encoding and compressing the video input, which the diffusion model then leverages to iteratively refine the desired audio effects from background noise based on the user’s optional text prompt and from the visual input. This audio output is finally decoded and exported as a waveform that can then be recombined with the video input.

Recommended Videos

The best part is that the user doesn’t have to go in and manually (read: tediously) sync the audio and video tracks, as the V2A system does it automatically. “By training on video, audio and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts,” the Deep Mind team wrote.

V2A Wolf

The system is not yet perfected, however. For one, the output audio quality is dependent on the fidelity of the video input and the system gets tripped up when video artifacts or other distortions are present in the input. According to the Deep Mind team, syncing dialogue to the audio track remains an ongoing challenge.

V2A Claymation family

“V2A attempts to generate speech from the input transcripts and synchronize it with characters’ lip movements,” the team explained. “But the paired vide- generation model may not be conditioned on transcripts. This creates a mismatch, often resulting in uncanny lip-syncing, as the video model doesn’t generate mouth movements that match the transcript.”

The system still needs to undergo “rigorous safety assessments and testing” before the team will consider releasing it to the public. Every video and soundtrack generated by this system will be affixed with Deep Mind’s SynthID watermarks. This system is far from the only audio-generating AI currently on the market. Stability AI dropped a similar product just last week while ElevenLabs released their sound effects tool last month.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
As iPads get pricier, Motorola’s Pad 70 Pro arrives as a solid option… just not for US buyers yet
Great specs, a stylus in the box, and no US launch date: the Moto Pad 70 Pro sounds both impressive and disappointing.
Computer, Electronics, Laptop

If you don’t know about Apple’s recent price hike, which affected all the products in its lineup except the iPhone and Apple Watch (for now), you’ve got to be living under some sort of a rock. The revision made all the iPads much more expensive. 

Motorola, however, has just launched a 13-inch tablet that actually sounds good on paper. It’s called the Moto Pad 70 Pro, and it costs around $440 for the baseline model. The catch, however, is that the device isn’t available in the US yet. 

Read more
The refurbished MacBook Neo may be your best way around Apple’s price hike
MacBook Neo has hit Apple’s refurbished store after its price increase
Student using MacBook Neo in classroom.

The MacBook Neo launched in March as Apple’s most affordable notebook, but it has already been caught in the company’s recent price hike. The base model with 8GB of RAM and 256GB of storage now costs $699, while the 512GB version with Touch ID is priced at $799.

Just days later, Apple has already listed refurbished MacBook Neo models on its online store, giving buyers a cheaper official option, though the savings are not as generous as you might expect.

Read more
This cross-device clipboard app solves the copy-paste problem I keep running into on my Mac
ClipboardAI keeps a searchable history of everything you copy
Text, Electronics, Mobile Phone

I have lost count of how many times I have copied something important, copied another thing before pasting it, and then realized the first item was gone. It is a small frustration, but it happens often enough to become annoying. I recently came across ClipboardAI, which caught my attention because it goes beyond Apple’s built-in clipboard by saving copied items into a searchable history.

Instead of replacing the last thing you copied every time, ClipboardAI keeps a searchable record of copied text, links, codes, email addresses, phone numbers, addresses, and images across iPhone, iPad, and Mac. That means an older clip does not disappear just because you copied something new.

Read more