Skip to main content

Google’s AI just got ears

The Google Gemini AI logo.
Google

AI chatbots are already capable of “seeing” the world through images and video. But now, Google has announced audio-to-speech functionalities as part of its latest update to Gemini Pro. In Gemini 1.5 Pro, the chatbot can now “hear” audio files uploaded into its system and then extract the text information.

The company has made this LLM version available as a public preview on its Vertex AI development platform. This will allow more enterprise-focused users to experiment with the feature and expand its base after a more private rollout in February when the model was first announced. This was originally offered only to a limited group of developers and enterprise customers.

Recommended Videos

1. Breaking down + understanding a long video

I uploaded the entire NBA dunk contest from last night and asked which dunk had the highest score.

Gemini 1.5 was incredibly able to find the specific perfect 50 dunk and details from just its long context video understanding! pic.twitter.com/01iUfqfiAO

— Rowan Cheung (@rowancheung) February 18, 2024

Please enable Javascript to view this content

Google shared the details about the update at its Cloud Next conference, which is currently taking place in Las Vegas. After calling the Gemini Ultra LLM that powers its Gemini Advanced chatbot the most powerful model of its Gemini family, Google is now calling Gemini 1.5 Pro its most capable generative model. The company added that this version is better at learning without additional tweaking of the model.

Gemini 1.5 Pro is multimodal in that it can interpret different types of audio into text, including TV shows, movies, radio broadcasts, and conference call recordings. It’s even multilingual in that it can process audio in several different languages. The LLM may also be able to create transcripts from videos; however, its quality may be unreliable, as mentioned by TechCrunch.

When first announced, Google explained that Gemini 1.5 Pro used a token system to process raw data. A million tokens equate to approximately 700,000 words or 30,000 lines of code. In media form, it equals an hour of video or around 11 hours of audio.

There have been some private preview demos of Gemini 1.5 Pro that demonstrate how the LLM is able to find specific moments in a video transcript. For example, AI enthusiast Rowan Cheung got early access and detailed how his demo found an exact action shot in a sports contest and summarized the event, as seen in the tweet embedded above.

However, Google noted that other early adopters, including United Wholesale Mortgage, TBS, and Replit, are opting for more enterprise-focused use cases, such as mortgage underwriting, automating metadata tagging, and generating, explaining, and updating code.

Fionna Agomuoh
Fionna Agomuoh is a Computing Writer at Digital Trends. She covers a range of topics in the computing space, including…
Google has some ‘good ideas’ for putting ads in Gemini
Gemini Advanced on the Google Pixel 9 Pro Fold.

Google is exploring adding ads to Gemini AI. CEO Sundar Pichai floated the idea in an earnings call but did not mention a specific date, according to The Verge. He also noted that the company has "very good ideas" about how it could appear in the future.

This year's focus remains on enhancing user experience features and broadening subscription offerings. Pichai noted that advertising has been essential in scaling other Google services, such as YouTube, possibly hinting that ads will eventually come to Gemini. However, Pichai did not mention how Google plans to integrate ads into Gemini when they appear in the AI. He also said they are committed to making the products work and delivering them to a vast audience.

Read more
Google puts military use of AI back on the table
First step of Gemini processing a PDF in Files by Google app.

On February 4, Google updated its “AI principles,” a document detailing how the company would and wouldn’t use artificial intelligence in its products and services. The old version was split into two sections: “Objectives for AI applications” and “AI applications we will not pursue,” and it explicitly promised not to develop AI weapons or surveillance tools.

The update was first noticed by The Washington Post, and the most glaring difference is the complete disappearance of any “AI applications we will not pursue” section. In fact, the language of the document now focuses solely on “what Google will do,” with no promises at all about “what Google won’t do.”

Read more
Google’s “Ask for Me” will have an AI schedule your next oil change
a phone saying hello

Google announced a new experimental AI feature being made available to select users on Thursday. Dubbed "Ask for Me," this AI agent will look up pricing and appointment availability for local businesses and automatically make reservations on your behalf -- though it only works for nail salons and mechanics shops currently.

Accessible through the Google Search Labs, Ask for Me will initiate when users search for either nail salons or auto repair centers with Google Search. If, for example, you need a mechanic, the feature will pepper you with questions about the service you need, the make and model of your car, and your availability to bring it in for work, before reaching out to the shop. You'll also need to enter your contact information (phone number and email, specifically) so the AI can keep you apprised of its efforts.

Read more