Google strikes back with an answer to OpenAI's Sora launch

Google’s DeepMind division unveiled its second generation Veo video generation model on Monday, which can create clips up to two minutes in length and at resolutions reaching 4K quality — that’s six times the length and four times the resolution of the 20-second/1080p resolution clips Sora can generate.

Of course, those are Veo 2’s theoretical upper limits. The model is currently only available on VideoFX, Google’s experimental video generation platform, and its clips are capped at eight seconds and 720p resolution. VideoFX is also waitlisted, so not just anyone can log on to try Veo 2, though the company announced that it will be expanding access in the coming weeks. A Google spokesperson also noted that Veo 2 will be made available on the Vertex AI platform once the company can sufficiently scale the model’s capabilities.

Recommended Videos

“Over the coming months, we’ll continue to iterate based on feedback from users,” Eli Collins told TechCrunch, “and [we’ll] look to integrate Veo 2’s updated capabilities into compelling use cases across the Google ecosystem … We expect to share more updates next year.”

Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥

We’re also releasing an improved version of our text-to-image model, Imagen 3 – available to use in ImageFX through… pic.twitter.com/h6ejHaMUM4

— Google DeepMind (@GoogleDeepMind) December 16, 2024

Veo 2 reportedly holds a number of advantages over its predecessors, including a better understanding of physics (think better fluid dynamics and better illumination/shadowing effects) as well as the capacity to generate “clearer” video clips, in that generated textures and images are sharper and less prone to blurring when moving. The new model also offers improved camera controls, enabling the user to position the virtual camera lens with greater precision than before.

As TechCrunch notes, Veo 2 has not yet perfected the video generation process, though it does appear to hallucinate far less than rivals like Sora, Kling, Movie Gen, or Gen 3 Alpha. “Coherence and consistency are areas for growth,” Collins said. “Veo can consistently adhere to a prompt for a couple minutes, but [it can’t] adhere to complex prompts over long horizons. Similarly, character consistency can be a challenge. There’s also room to improve in generating intricate details, fast and complex motions, and continuing to push the boundaries of realism.”

Google also announced improvements to Imagen 3 on Monday, enabling the commercial image generation model to create “brighter, better-composed” outputs. The model, available on ImageFX, will also offer additional descriptive suggestions based on keywords in the user’s prompt, with each keyword spawning a drop-down menu of related terms.