ChatGPT Trained on Copyrighted Books! Memorized Harry Potter, Other Novels, New Study Claims

Will this lead to new lawsuits?

ChatGPT is allegedly trained on copyrighted books, such as the "Harry Potter" novel.

ChatGPT Allegedly Trained on Copyrighted Books! New Study Claims It Memorized 'Harry Potter,' Other Novels
Teachers are seen behind a laptop during a workshop on ChatGpt bot organised for by the School Media Service (SEM) of the Public education of the Swiss canton of Geneva, on February 1, 2023. FABRICE COFFRINI/AFP via Getty Images

This was claimed by a new study, which revealed that the AI chatbot actually memorized numerous popular novels. These include "Harry Potter" children's books, "A Game of Thrones," "Dune," "Hitchhiker's Guide to the Galaxy," and "Fahrenheit 451."

The latest study titled "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4" was published in Cornell University journal.

ChatGPT Allegedly Trained on Copyrighted Books!

According to New Scientist's latest report, David Bamman, an associate professor in the School of Information at the University of California, Berkley, looked if the ChatGPT memorized copyrighted books.

ChatGPT Allegedly Trained on Copyrighted Books! New Study Claims It Memorized 'Harry Potter,' Other Novels
Copies of the new Harry Potter and the Half Blood Prince, by author J. K. Rowling are seen at the Amazon.com shipping facility July 11, 2005 in Fernley, Nevada. Amazon.com is packing up over 800,000 orders of the new Harry Potter book for shipment in the United States on July 16th. Over 1.2 million orders have been received worldwide. Justin Sullivan/Getty Images

The findings of their study revealed that their assumption is actually correct.
Bamman said that ChatGPT and GPT-4 had memorized a wide collection of copyrighted materials.

Aside from being able to memorize books, Bamman and his colleagues also discovered that both AI focused on fantasy and science fiction categories.

For some people, this could be a harmless finding. However, Bamman highlighted the issues of training AIs on sci-fi and fiction texts.

"With the bias toward sci-fi/fantasy, we should be thinking about whose narrative experiences are encoded in these models and how that influences other behaviors," said the UC professor via his official Twitter post.

Other Issues of AI Being Trained on Books

Of course, one of the main issues with ChatGPT and GPT-4 being trained on novels is copyright problems.

After the new study was published, Tyler Ochoa, a professor in the Law Department of Santa Clara UC, said that lawsuits against LLMs are likely to happen.

Ochoa said that the copyright issues that AI image generators face are also similar to AI text generators.

He further explained that if AIs generate output that is very similar to the input, it would definitely lead to copyright infringement.

You can click here to learn more about the new study about ChatGPT.

In other stories, the new AI-powered app called Superchat is now allowing users to chat with historical and fictional characters.

Recently, financial experts claimed that AI models could drastically change how the banking industry works.

For more news updates about AIs and other similar innovations, always keep your tabs open here at TechTimes.

Tech Times
Article owned by Tech Times | Written by Griffin Davis Photo owned by Tech Times
ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion
Real Time Analytics