Skip to main content

Google’s new AI generates audio soundtracks from pixels

An AI generated wolf howling
Google Deep Mind

Deep Mind showed off the latest results from its generative AI video-to-audio research on Tuesday. It’s a novel system that combines what it sees on-screen with the user’s written prompt to create synced audio soundscapes for a given video clip.

The V2A AI can be paired with vide -generation models like Veo, Deep Mind’s generative audio team wrote in a blog post, and can create soundtracks, sound effects, and even dialogue for the on-screen action. What’s more, Deep Mind claims that its new system can generate “an unlimited number of soundtracks for any video input” by tuning the model with positive and negative prompts that encourage or discourage the use of a particular sound, respectively.

V2A Cars

The system works by first encoding and compressing the video input, which the diffusion model then leverages to iteratively refine the desired audio effects from background noise based on the user’s optional text prompt and from the visual input. This audio output is finally decoded and exported as a waveform that can then be recombined with the video input.

The best part is that the user doesn’t have to go in and manually (read: tediously) sync the audio and video tracks, as the V2A system does it automatically. “By training on video, audio and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts,” the Deep Mind team wrote.

V2A Wolf

The system is not yet perfected, however. For one, the output audio quality is dependent on the fidelity of the video input and the system gets tripped up when video artifacts or other distortions are present in the input. According to the Deep Mind team, syncing dialogue to the audio track remains an ongoing challenge.

V2A Claymation family

“V2A attempts to generate speech from the input transcripts and synchronize it with characters’ lip movements,” the team explained. “But the paired vide- generation model may not be conditioned on transcripts. This creates a mismatch, often resulting in uncanny lip-syncing, as the video model doesn’t generate mouth movements that match the transcript.”

The system still needs to undergo “rigorous safety assessments and testing” before the team will consider releasing it to the public. Every video and soundtrack generated by this system will be affixed with Deep Mind’s SynthID watermarks. This system is far from the only audio-generating AI currently on the market. Stability AI dropped a similar product just last week while ElevenLabs released their sound effects tool last month.

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
I compared ChatGPT against Google Gemini to see which is the better free AI chatbot
A person typing on a laptop that is showing the ChatGPT generative AI website.

Two of the leading AI chatbots available today come from Google, with its Gemini system, and OpenAI, the company that kicked off the AI revolution with ChatGPT.
But you might be wondering which is the better free chatbot. I've spent a significant time with both to see how they compare, break down the costs and benefits of each service, explain what features you'll have to pay for and which you get for free, and show you which AI is best for what you need.

Pricing and tiers
Both ChatGPT and Gemini are available to the public for free at their respective websites and through their mobile apps. However, free tier users will only receive limited access to the most current and capable models.

Read more
Adobe’s Firefly AI is getting competition at the worst time
an AI generated images of what they're calling a cheetah but which is clearly more along the lines of a leopard.

For a hot second there, Adobe enjoyed a unique niche within the generative AI industry thanks to its Firefly AI and Stock image hosting platform, which was trained on the company's proprietary and "commercially safe" dataset of licensed images. Now, Getty Images is getting in on the game and launching a rival model. On Thursday, PicsArt, the AI-powered online image and video-editing service, announced that it will be partnering with Getty to build and train a generative AI based on Getty's exclusive library of photo and video content.

“This partnership connects Getty Images’ vast creative library with the next generation of marketers and creators, empowering them with high-quality content for use directly within the PicsArt platform," Grant Farhall, chief product officer at Getty Images, said in a statement released Thursday. "It allows creators to bring their visions to life while maintaining the highest standards of commercial safety.”

Read more
Prime Day 2-in-1 laptop deals: Dell, HP and more
The Dell Inspiron 14 2-in-1 laptop against a white background.

One of the more versatile options when it comes to the best Prime Day laptop deals are 2-in-1 laptops, and you'll find quite a few to choose from among the Prime Day deals going on right now, ahead of Prime Day which officially starts tomorrow. Brands like Dell, HP, and Lenovo are seeing some of their most popular 2-in-1 laptops discounted for Prime Day, and we've got the beat on the best Prime Day 2-in-1 laptop deals that you can shop right now. You'll find below our choice for the best 2-in-1 laptop Prime Day deal, as well as some other options if it's not the right fit for you. Prime Day deals are coming and going quickly, so if you see a laptop deal that works for you be sure to grab it while you can.
Best 2-in-1 laptop Prime Day deal
Dell Inspiron 14 2-in-1 laptop -- $450, was $700

The Dell Inspiron 14 2-in-1 laptop is a relatively affordable option that offers the versatility of this type of device alongside reliable performance. It's equipped with the AMD Ryzen 5 7530U processor, integrated AMD Radeon Graphics, and 8GB of RAM, which will be more than enough for daily tasks that you need to get done for work or school. The Dell Inspiron 14 2-in-1 laptop also features a 14-inch touchscreen with Full HD+ resolution for sharp details and bright colors, and it's got Windows 11 Home pre-loaded in its 512GB SSD that should provide enough storage space for your important documents.

Read more