Sora's New Realistic AI-Generated Videos Means We Can’t Trust Our Eyes Anymore

A photo shows a frame of a video generated by a new intelligence artificial tool, dubbed "Sora", unveiled by the company OpenAI, in Paris on February 16, 2024. — Photo by STEFANO RELLANDINI/AFP via Getty Images

OpenAI's new Sora AI model creates realistic videos from text prompts; the results are amazing and terrifying.

The company has already impressed us with its Chat GPT text bot and the text-to-image engine Dall-E. Now it's here with the Sora text-to-video tool, which can generate videos up to one minute long from text instructions. As you will see from the examples, the results are quite astonishing, although-like other so-called AI models, they're also full of weird giveaway glitches.

"It is hard not to be in awe of how far these systems have progressed and what they are now able to do - from whole-cloth generation to more nuanced examples like extending the runtime or changing a video's setting," artist and designer Nick Heer says on his Pixel envy blog.

Don't Trust Your Eyes

Look at the weird salt-flats-astronaut video in this official Sora showreel, or check the individual version over at the Sora site. Unless you have real experience of analyzing this kind of thing, it's going to look awfully convincing. The only giveaway on first viewing is the kind of dead-stare look from the virtual "actors," which is also something of a signature look in still images created by Open AI rival Stability AI's text-to-image generator, Stable Diffusion. It really feels like a trailer for a pretty cool-looking indie sci-fi flick.

https://www.youtube.com/watch?v=HK6y8DAPN_0

But Sora's output isn't always so good. It suffers from the same odd glitches as other AI image generators, which still have trouble with hands and feet. It's not on the YouTube showreel, but on the site, you can see a video made from the prompt "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots." The video looks realistic until you check out the woman's legs, which seem to switch places at some points. The video of the people in a future Lagos appears like giants until the swooping camera settles to the ground.

But this is Sora's first public outing, and it's already good enough. Tell me that if you saw the fake footage of San Francisco during the gold rush in any other context, you wouldn't think it was real. Deep fakes and AI-generated photos are one thing, but video is a whole other game. We literally cannot trust video "evidence" anymore, and with the US presidential election and the U.K. general election both just around the corner, that's a terrifying prospect.

"The biggest dangers are misinformation and manipulation, hands down. Imagine videos that can perfectly mimic public figures saying things they've never said or events that never happened. It's a double-edged sword. On one hand, it's a marvel of technology; on the other, it's a potential nightmare for misinformation," Brian Prince,  founder and CEO of AI educational platform Top AI Tools, told Tech Times in an interview.

Verifying Is Hard

So, what can be done about protecting the integrity of images in the face of powerful AI-generating tools like Sora? On the face of it, not much. Open AI promises at the beginning of that showreel that there will be safeguards before this is released to the public.

"While advances in AI have made it easier to manipulate videos, there are also technologies, such as blockchain, that can help maintain the authenticity and integrity of footage," Jared Floyd, founder and executive producer at Ajax Creative, told Tech Times.

The problem is that the average viewer probably doesn't know what the blockchain is, let alone how to use it to verify videos. Protections and computer-visible watermarks may help prove a video's integrity (or not). However, that counts for nothing if a bad actor spreads compromising AI-generated videos on TikTok or Facebook.

One fact about these generative models, whether text, photo, or video, is that they need huge resources to create. To trawl the internet for existing media and then feed that into a machine-learning model is extremely expensive in terms of money, computing power, and also the environmental impact of the electricity needed to run them.

The costs involved have an upside for privacy. Right now, only a few companies are capable of such expenses, which means they can be regulated. The FTC is about to enact a new rule that will make the creators of AI tools liable for deep fakes created with their tools. That's a start, but at the speed at which this technology is moving, coupled with the harm it may cause in the upcoming elections, experts say the government needs to move fast on regulating generative AI, or the results could end up disastrous.

Charlie Sorrel has been writing about technology, and its effects on society and the planet, for almost two decades. Previously, you could find him at Wired's Gadget Lab, Fast Company's CoExist, Cult of Mac, and Mac Stories. He also writes for his own site, StraightNoFilter.com, Lifewire Tech News, and iFixit.

Tags:Artificial Intelligence