Meta has unveiled Emu Video, an evolution of its image generation tool, Emu, suggesting that the tech giant may be edging closer to the realm of AI-generated movies. Emu Video is designed to generate four-second animated clips based on a caption image or a photo paired with a description.
The newly revealed Emu Edit complements Emu Video by offering users an AI model that enables the editing of generated clips. Users have the option to indicate the adjustments they desire using natural language, and Emu Edit will execute these modifications, ensuring a smooth and user-friendly editing experience for those exploring AI-generated content.
Meta's Emu Video
Meta's foray into the field of generative AI has witnessed a rapid evolution from image generation to video generation in a short span.
The announcement at Meta Connect unveiled Emu as the company's foundational model for image generation, playing a role in various generative AI experiences, including AI image editing tools for Instagram and the Imagine feature within Meta AI.
Emu Video utilizes the Emu model and relies on a text-to-video generation approach based on diffusion models. This unified architecture responds to a variety of inputs, including text only, image only, and a combination of text and image.
The process is divided into two steps: generating images based on a text prompt and then generating videos based on both the text and the generated image. This "factorized" or split approach enhances the efficiency of training video generation models.
In contrast to previous models requiring complex cascades, Meta's approach employs just two diffusion models to generate 512x512 four-second videos at 16 frames per second.
Human evaluations indicate strong preferences for Meta's video generations, with users favoring this model over prior work for its quality and faithfulness to the text prompt, according to the tech giant's claim.
Read Also : BLACKPINK to Join the Virtual World With Their First-Ever Concert in Meta Horizon Worlds
Emu Edit
Additionally, Meta introduces Emu Edit, a novel approach to streamline image manipulation tasks, providing enhanced capabilities and precision to image editing. Emu Edit allows free-form editing through instructions, covering tasks such as local and global editing, background removal and addition, color and geometry transformations, and more.
Notably, Emu Edit focuses on precise alterations, ensuring pixels unrelated to the instructions remain untouched.
Meta's Emu Edit is built on a dataset containing 10 million synthesized samples, making it one of the largest datasets of its kind. The model exhibits superior performance over current methods, achieving state-of-the-art results in both qualitative and quantitative evaluations for a range of image editing tasks, according to Meta.
While Meta emphasizes that the current work is fundamental research, the potential use cases are diverse. The technologies, including Emu Video and Emu Edit, could enable users to generate animated stickers, GIFs, or enhance their social media content without requiring advanced technical skills.
"While certainly no replacement for professional artists and animators, Emu Video, Emu Edit, and new technologies like them could help people express themselves in new ways-from an art director ideating on a new concept or a creator livening up their latest reel to a best friend sharing a unique birthday greeting. And we think that's something worth celebrating," Meta said in a statement.