OpenAI Sued by YouTuber for AI Data Scraping

OpenAI is reportedly facing another lawsuit on its supposed artificial intelligence training methods, this time by YouTuber David Millette.

Lawyers for David Millette, a YouTube user from Massachusetts, claim in a lawsuit filed on Friday in the U.S. District Court for the Northern District of California that OpenAI secretly transcribed videos from Millette and other creators to train the models for their AI-powered chatbot platform, ChatGPT, and other generative AI tools and products.

The complaint reportedly alleges that OpenAI significantly benefited from the creators' work by gathering this data, while also infringing copyright law and YouTube's terms of service which forbid using videos for apps not connected to its platform.

GERMANY-US-INTERNET-AI-ARTIFICIAL-INTELLIGENCE — A photo taken on November 23, 2023 shows the logo of the ChatGPT application developed by US artificial intelligence research organization OpenAI on a laptop screen (R) and the letters AI on a smartphone screen in Frankfurt am Main, western Germany. Photo by KIRILL KUDRYAVTSEV/AFP via Getty Images

(Photo : KIRILL KUDRYAVTSEV/AFP via Getty Images)

A photo taken on November 23, 2023 shows the logo of the ChatGPT application developed by US artificial intelligence research organization OpenAI on a laptop screen (R) and the letters AI on a smartphone screen in Frankfurt am Main, western Germany.

OpenAI improved the value of its products for users by enhancing their language models with the plaintiffs' videos. Subscribers who purchased access to these products benefited from this improvement, while the plaintiffs and Class members did not receive any compensation, according to the complaint.

The lawsuit claims that OpenAI's keeping of the benefits provided by the plaintiff and potential class members is unfair and unjust, and they must be required to provide restitution.

The suggested class action lawsuit alleges a demand for unjust enrichment or repayment, as well as a legal claim for unfair competition according to California regulations.

Apart from seeking class certification, the complaint also requests damages, fair monetary relief, injunction relief, reasonable attorney fees, and costs. OpenAI did not reply promptly to a request for comment.

Training AI

AI models that generate content, such as those developed by OpenAI, lack genuine intelligence. By providing a vast array of content, from videos to essays, models can determine the likelihood of data occurrences through pattern recognition, taking into account the context of surrounding data.

The majority of models are trained using data obtained from public websites and various datasets found on the internet. Businesses claim that the doctrine of fair use protects their ability to gather data without discrimination and utilize it for developing commercial models.

However, many copyright holders hold a different opinion and are taking legal action to stop the practice. Video transcriptions are now crucial for training data as other data sources diminish.

OpenAI's Legal Troubles

Lawsuits challenging OpenAI's extensive use of internet data for ChatGPT AI training persist, posing a legal issue for the company.

Some of the problem arises from publishers increasingly accusing these companies of removing confidential data. They expect to be compensated for the work they have completed. Meta and OpenAI contended to the US Copyright Office that distributing copyrighted content online falls under fair use laws as it is considered "publicly available."

Nevertheless, they will have to make their case in court as the business is facing lawsuits from various parties regarding the copyrighted material.