Skip to main content

Nvidia’s new AI model makes music from text and audio prompts

Nvidia logo.
Nvidia

Nvidia has released a new generative audio AI model that is capable of creating myriad sounds, music, and even voices, based on the user’s simple text and audio prompts.

Recommended Videos

Dubbed Fugatto (aka Foundational Generative Audio Transformer Opus 1) the model can, for example, create jingles and song snippets based solely on text prompts, add or remove instruments and vocals from existing tracks, modify both the accent and emotion of a voice, and “even let people produce sounds never heard before,” per Monday’s announcement post.

“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, a manager of applied audio research at Nvidia. “Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”

The company notes that music producers could use the AI model to rapidly prototype and vet song ideas in various musical styles with varying arrangements, or add effects and additional layers to existing tracks. The model could also be leveraged to adapt and localize the music and voiceovers of an existing ad campaign, or adjust the music of a video game on the fly as the player plays through a level.

The model is even capable of generating previously unheard sounds like barking trumpets or meowing saxophones. In doing so, it uses a technique called ComposableART to combine the instructions it learned during training.

“I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one,” Nvidia AI researcher Rohan Badlani wrote in the announcement post. “In my tests, the results were often surprising and made me feel a little bit like an artist, even though I’m a computer scientist.”

The Fugatto model itself uses 2.5 billion parameters and was trained on 32 H100 GPUs. Audio AI’s like this are becoming increasingly common. Stability AI unveiled a similar system in April that can generate tracks up to three minutes in length while Google’s V2A model can generate “an unlimited number of soundtracks for any video input.”

YouTube recently released an AI music remixer that generates a 30-second sample based on the input song and the user’s text prompts. Even OpenAI is experimenting in this space, having released an AI tool in April that needs just 15 seconds of sample audio in order to fully clone a user’s voice and vocal patterns.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Amazon’s AI shopper makes sure you don’t leave without spending
Amazon Buy for Me feature.

The future of online shopping on Amazon is going to be heavily dependent on AI. Early in 2025, the company pushed its Rufus AI agent to spill product information and help users find the right items. A few weeks later, another AI tool called Interests made its way to the shopping site. 

The new Alexa+ AI assistant is also capable of placing orders semi-autonomously, handling everything from groceries to booking appointments. Now, the company has started to test yet another AI agent that will buy products from other websites if they’re not available on Amazon — without ever leaving the app. 

Read more
Amazon’s AI agent will make it even easier for you to part with your money
Amazon Nova Act performing task in a web browser.

The next big thing in the field of artificial intelligence is Agentic AI, which is essentially an AI tool that can automate certain multi-step processes for users. For example, interacting with a web browser for tasks like booking tickets or ordering groceries. 

Amazon certainly sees a future in there. After giving a massive overhaul to Alexa and introducing a new Alexa+ assistant, the company has today announced a new AI agent called Nova Act. Amazon says Nova Act is designed to “complete tasks in a web browser.” Amazon won’t be the first to reach this milestone, as few other AI companies have already attempted this vision. 

Read more
OpenAI’s latest model creates life like images and readable text, try it free
ChatGPT and OpenAI logos.

OpenAI has introduced its 4o model into ChatGPT to enable native image generation within the chatbot atmosphere. This upgrade makes it so you don’t have to use OpenAI’s Dall-E image generation model as a separate entity, though Dall-E remains available for those as a preference. The AI brand has also enabled its Sora AI video generator within ChatGPT. 

The new features are currently available for ChatGPT free users, as well as for ChatGPT Plus, Team, and Pro users. Availability will be coming to enterprise and education users next week.

Read more