Nvidia’s new AI model makes music from text and audio prompts

By Andrew Tarantola Published November 25, 2024

Nvidia

Nvidia has released a new generative audio AI model that is capable of creating myriad sounds, music, and even voices, based on the user’s simple text and audio prompts.

Dubbed Fugatto (aka Foundational Generative Audio Transformer Opus 1) the model can, for example, create jingles and song snippets based solely on text prompts, add or remove instruments and vocals from existing tracks, modify both the accent and emotion of a voice, and “even let people produce sounds never heard before,” per Monday’s announcement post.

Recommended Videos

“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, a manager of applied audio research at Nvidia. “Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”

The company notes that music producers could use the AI model to rapidly prototype and vet song ideas in various musical styles with varying arrangements, or add effects and additional layers to existing tracks. The model could also be leveraged to adapt and localize the music and voiceovers of an existing ad campaign, or adjust the music of a video game on the fly as the player plays through a level.

The model is even capable of generating previously unheard sounds like barking trumpets or meowing saxophones. In doing so, it uses a technique called ComposableART to combine the instructions it learned during training.

“I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one,” Nvidia AI researcher Rohan Badlani wrote in the announcement post. “In my tests, the results were often surprising and made me feel a little bit like an artist, even though I’m a computer scientist.”

The Fugatto model itself uses 2.5 billion parameters and was trained on 32 H100 GPUs. Audio AI’s like this are becoming increasingly common. Stability AI unveiled a similar system in April that can generate tracks up to three minutes in length while Google’s V2A model can generate “an unlimited number of soundtracks for any video input.”

YouTube recently released an AI music remixer that generates a 30-second sample based on the input song and the user’s text prompts. Even OpenAI is experimenting in this space, having released an AI tool in April that needs just 15 seconds of sample audio in order to fully clone a user’s voice and vocal patterns.

Andrew Tarantola

Former Computing Writer

Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…

Topics

Tech News

Computing

A YouTuber 3D printed an entire outfit, but the comfort and cost are more complicated than you’d think

The 3D-printed outfit is real. Whether it's practical is a different conversation entirely.

Adult, Male, Man

YouTuber Matthew Trahan has made a career out of 3D printing increasingly unusual things. He has printed musical instruments, bedroom furniture, and, in one particularly memorable video, himself.

His latest project is a full outfit, from shirt to shoes, belt to glasses, because apparently nobody told him 3D printers are for creating engineering prototypes or structures that aren’t otherwise feasible, not for fashion week.

Computing

The memory crisis isn’t going to ease, and you will pay the price for it, says a research firm

Forty to 50% higher this quarter, 30 to 40% more next quarter, and no real relief until 2028. Plan accordingly.

RAM memory chips

If you were hoping the memory crisis was about to ease up, I have some bad news for you. It comes directly from Wall Street.

Your next smartphone, laptop, or tablet could cost even more, regardless of whether it has recently been subject to a price hike.

Computing

Apple’s next Mac Studio could get a new M5 Ultra chip and a cooler upgrade

The desktop workstation is tipped to receive an M5 Ultra this year, an M7 Ultra later, and a redesigned heat sink.

Apple Mac Studio Featured

Apple's Mac Studio may not be getting a fresh new look anytime soon, but it could be getting a meaningful upgrade where it matters most. According to Mark Gurman in the latest edition of his Power On newsletter, Apple is preparing an M5 Ultra-powered Mac Studio as early as this year, while an even more powerful M7 Ultra version is already on the company's roadmap for 2028. Interestingly, the report also claims Apple is redesigning one component most users will never see: the heat sink.

More power is coming, and Apple wants to keep it cool