In a recent study by Cornell researchers, it has been revealed that ChatGPT is capable of memorizing poems, raising ethical concerns about privacy and data usage.
The study suggests that the model and other proprietary artificial intelligence models may be trained using data scraped from the internet, potentially including private information.
"It's generally not good for large language models to memorize large chunks of text, in part because it's a privacy concern," first author Lyra D'Souza said in a statement.
"We don't know what they're trained on, and a lot of times, private companies can train proprietary models on our private data," she added.
Can AI Models Memorize Poems?
The researchers focused on ChatGPT and three other language models: PaLM from Google AI, Pythia from EleutherAI, and GPT-2, an earlier version of ChatGPT. The testing involved prompting the models to provide the text of selected poems from various American poets.
The results were intriguing, with ChatGPT successfully retrieving 72 out of 240 poems. In contrast, other models had varying degrees of success, with PaLM producing only 10 complete poems and Pythia and GPT-2 struggling to generate entire poems.
The study identified the inclusion of poems in the poetry canon as a crucial factor influencing the chatbot's ability to memorize them. The most reliable predictor of memorization was whether a poem appeared in a "Norton Anthology of Poetry," particularly the 1983 edition.
One of the primary concerns highlighted by the researchers is the potential privacy issues associated with language models memorizing large chunks of text.
D'Souza emphasized that not knowing the specifics of what these models are trained on raises privacy concerns, especially when proprietary models by private companies might be trained on users' private data.
ChatGPT's Behavior Changes Over Time
The study also observed changes in ChatGPT's behavior over time. Initially, the model would fabricate or recycle poems if it didn't know a specific one.
However, by July 2023, it began questioning the existence of the poem, putting the responsibility on the user. This evolution in behavior raised additional concerns about the transparency and responsibility of using powerful AI tools like ChatGPT.
While the study focused on American poets, the researchers intend to expand their investigation to see how chatbots respond to requests in different languages. They also plan to explore whether factors such as the poem's length, meter, and rhyming pattern influence the likelihood of memorization.
The study sheds light on the ethical considerations surrounding AI models' ability to memorize and potentially reproduce copyrighted content, emphasizing the need for scrutiny and responsible deployment of such technologies.
"ChatGPT is a really powerful new tool that's probably going to be part of our lives moving forward," D'Souza said. "Figuring out how to use it responsibly and use it transparently is going to be really important."