Authors File Lawsuit Against OpenAI For Using Copyrighted Books for Training Without Permission

OpenAI, the organization behind the artificial intelligence tool ChatGPT, is facing a lawsuit filed by authors Mona Awad and Paul Tremblay. They claim that OpenAI breached copyright law by training its model on their novels without obtaining permission.

OpenAI is Yet to Face New Lawsuit

Authors File Lawsuit Against OpenAI For Using Copyrighted Books For AI Training Without Permission — Two authors waged a legal battle with OpenAI following allegations that their books were used for ChatGPT training. Andrew Neel from Unsplash

In a report by CNBC, OpenAI becomes the newest subject of complaint from two book authors who believe that the company used their content without consent.

ChatGPT, a chatbot that generates human-like responses, is trained using publicly available data from the internet. The authors assert that ChatGPT produced accurate summaries of their copyrighted books, leading to their inclusion in the lawsuit.

The class-action lawsuit filed by Awad and Tremblay in a San Francisco federal court marks the first copyright complaint against ChatGPT. The case will explore the legal boundaries within the generative AI field, raising questions about the use of copyrighted material in training language models.

As we know, books are the typical sources of information on the internet. They are ideal for training large language models or LLMs due to their high-quality prose.

In relation to the lawsuit, OpenAI's use of copyrighted works has led to accusations of "stolen writing and ideas." The authors' lawyers argue that OpenAI should be held accountable for profiting from the unauthorized use of their work.

If that's the case, then proving financial losses directly attributable to ChatGPT's training on copyrighted material may be challenging. Despite the alleged infringement, ChatGPT's functionality may remain largely unaffected as it relies on a vast range of internet information, including discussions about the books.

OpenAI's Secrecy and Training Data

OpenAI's increasing secrecy surrounding its training data has raised concerns. The lawyers suggest that ChatGPT's training material, referred to as "Books2," likely includes books obtained from shadow libraries such as Library Genesis (LibGen) and Z-Library, according to The Guardian's report.

With that being said, the outcome of the case will depend whether the court will count "fair use" with respect to the copyright material usage. It would also mean that the ruling will be much different since the "fair use" defense is treated differently in the UK.

Protecting Authors From Dangers of AI

The publishing industry has been discussing how to safeguard authors from potential AI-related harms. The Society of Authors (SoA) published guidelines for authors to protect themselves and their work.

The lawsuit against OpenAI is seen as a positive step in addressing concerns regarding the unauthorized use of authors' work.

Despite strong laws to restrict AI, the current regulations are fragmented and struggling to keep pace with technological advancements. Policymakers are urged to consult principles that protect the value of human authorship and consider the recommendations put forth by organizations like the Authors' Licensing and Collecting Society (ALCS).

Last month, OpenAI was sued by a Georgia-based radio host after it provided false data about a legal case.