OpenAI Unveils Sora: AI-Powered Video Generation Raises Potential Misuse Concerns

OpenAI has unveiled its latest innovation, the Sora model, capable of producing minute-long videos based on text inputs. However, the model will not be readily accessible to the public until thorough evaluations of its potential misuse have been conducted.

Sora: Creating video from text — Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about. OpenAI

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

Generating Minute-Long Videos from Text Prompts

Named after the Japanese word for "sky," Engadget reported that Sora represents a significant advancement in AI-generated video technology.

Rather than releasing Sora to the general public immediately, OpenAI has opted to offer access to a select group of academics and researchers. Their task is to evaluate the model's capabilities and assess any risks associated with its use.

Sora demonstrates remarkable proficiency in generating intricate scenes featuring multiple characters, dynamic motions, and detailed environments. The model not only interprets text prompts accurately but also understands the spatial relationships within the depicted scenarios.

Sora's Limitations

According to OpenAI, Sora operates based on a "deep understanding of language," allowing it to accurately interpret text prompts. However, like most AI image and video generators, Sora is not flawless.

In one instance, the requested elements of a Dalmatian looking through a window and people "walking and cycling along the canal streets" were omitted entirely from the generated video.

Additionally, OpenAI cautions that the model may struggle with understanding cause and effect, as evident in videos where a person is depicted eating a cookie without visible bite marks.

Joining Companies with Test-To-Video Models

Sora joins a growing landscape of text-to-video models developed by companies such as Meta, Google, and Runway. While other tools have been introduced or hinted at, none match Sora's capability to produce videos up to 60 seconds in length.

Moreover, unlike its counterparts, Sora generates complete videos in one go, ensuring consistency of subjects throughout the video, even if they briefly exit the frame.

The emergence of text-to-video tools has ignited apprehension regarding their potential to produce highly realistic fake content.

Also Read : Apple Spotlight Search to Get LLM Boost, Generative AI Coming Soon?

Oren Etzioni, a professor at the University of Washington specializing in artificial intelligence and the founder of True Media, expressed deep concerns, particularly regarding the potential influence on closely contested elections.

Moreover, The New York Times reported the broader adoption of generative AI has triggered a backlash from artists and creative professionals who fear job displacement due to the technology's capabilities.

In response to these concerns, OpenAI has emphasized its collaboration with experts in various fields, including misinformation, hateful content, and bias, to thoroughly assess the tool's implications before its public release.

Additionally, the company is developing detection tools capable of identifying videos generated by Sora, along with embedding metadata in the videos to facilitate detection.

While OpenAI did not disclose specific details about Sora's training process, it noted the utilization of both publicly available videos and licensed content from copyright holders.