The fact that we can now create our own clips with AI-generated generators such as Luma AI's Dream Machine and OpenAI's Sora is both scary and fascinating at the same time.
While AI-produced videos are interesting to see with their random themes and styles, they often share a shortcoming. Apparently, they lack good audio that they can use on the platforms.
The good thing is that Google's DeepMind has a tool that can solve this issue.
Introducing Google DeepMind's Video-to-Audio Tool
According to TechRadar, Google DeepMind has unveiled a groundbreaking video-to-audio (V2A) tool that leverages pixels and text prompts to create soundtracks and soundscapes for AI-generated videos. This development marks a significant step toward fully automated movie scene creation.
The V2A technology can work with AI video generators, including Google's Veo, to produce atmospheric scores, sound effects, and dialogue that align with the video's characters and tone. It's kind of tricky but with the right selection of audio, it will be a powerhouse tool for next-gen movies.
Related Article : Runway's Gen-3 Alpha AI Video Generator Offers Better Controls, Creates Video Clips Faster and with Higher Fidelity
Unlimited Soundtrack Generation with Text Prompts
It's hard to insert a sound that doesn't share the same vibe with the main theme of a movie. Google's research AI lab proves that its V2A tool can work wonders beyond creators' imagination.
DeepMind's V2A tool has the ability to generate an unlimited number of soundtracks for any video input. This capability allows creators to fine-tune their audio outputs using simple text prompts, offering greater creative flexibility.
Unlike its competitors, this tool can generate audio purely from video pixels, making text prompts optional rather than necessary.
Ensuring Safety and Preventing Misuse
DeepMind is aware of the potential for misuse and the creation of deepfakes with this powerful technology.
As a precaution, the V2A tool is currently confined to research purposes. Before opening it to the public, DeepMind plans to conduct rigorous safety assessments and extensive testing. This cautious approach aims to mitigate risks and ensure the technology's responsible use.
Huge Potential for Filmmaking and Animation
The potential applications of the V2A tool are vast, especially in amateur filmmaking and animation.
Examples include a Blade Runner-inspired scene with electronic music and a cartoon featuring a baby dinosaur, demonstrating the tool's ability to reduce production costs significantly.
Despite some limitations, particularly with dialogue, the technology shows immense promise for future improvements.
The Future of AI-Generated Media
The integration of AI-generated videos with AI-created soundtracks and sound effects is a surprising leap in the industry.
OpenAI has announced plans to add audio to its Sora video generator, set to launch later this year. Speaking of which, Director Paul Trillo revealed the first music video done in Sora in May.
DeepMind's V2A tool is already demonstrating advanced capabilities, generating audio based solely on video content without extensive prompting.
How the V2A Tool Works
DeepMind's V2A tool employs a diffusion model that synthesizes information from video pixels and user text prompts to generate compressed audio, which is then decoded into an audio waveform.
Although the specifics of the training data remain unclear, Google's access to YouTube provides a considerable advantage. Some YouTube creators have contracts that allow their content to be used for training AI models, potentially contributing to the tool's development.
A Game-Changer for Content Creation
While there are still challenges to overcome, particularly in producing Hollywood-ready dialogue, DeepMind's V2A tool is a powerful asset for storyboarding and amateur filmmakers.
The rapid advancements and intense competition in the AI space suggest that these tools will continue to improve, offering even greater capabilities in the near future.
In other news, the Chinese short video app Kuaishou teased its AI-powered video generator that can produce 1080p videos for up to two minutes.