Right to be Forgotten (RTFB) laws, also known as Right to Erasure laws, are data protection regulations that grant individuals the right to request the removal or deletion of their personal information from online platforms and search engine results.
These laws are primarily aimed at enhancing individuals' privacy rights and giving them more control over their personal data on the internet. With the rapid growth in generative AI technology, concerns are now growing over its potential impact on user privacy.
The Data61 Business Unit at the Australian National Science Agency recently highlighted a crucial concern: large language models (LLMs) may violate existing RTBF laws, the Science X Network reported.
Right to be Forgotten Laws in the Age of AI
Dawen Zhang and six colleagues delve into the potential consequences of the widespread use of LLMs in their paper titled "Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions."
At the core of their concerns is the unique nature of LLMs compared to traditional search engines. While RTBF laws have primarily focused on search engines, LLMs cannot be overlooked when it comes to privacy regulations.
According to Zhang, "LLMs store and process information in a completely different way" in contrast to search engine indexing approaches. The data sources further highlighted the extent of the issue used to train LLMs.
A staggering 60% of training data for models like ChatGPT-3 is obtained from public resources, and both OpenAI and Google have heavily relied on Reddit conversations to bolster their LLMs.
The implications of this reliance on public data are significant. Zhang warned that LLMs might end up memorizing personal information, which could inadvertently appear in their output.
This opens the door to potential privacy violations, as instances of hallucination - the spontaneous output of false information - could lead to the dissemination of damaging or incorrect data that shadow private users.
The researchers said that adding to the complexity is the lack of transparency surrounding many generative AI data sources.
'Machine Unlearning'
To address these growing concerns, the researchers proposed extending existing RTBF laws to include LLMs. They advocate for developing processes that enable removing personal data from LLMs, such as "machine unlearning" using SISA (Shared, Isolated, Sliced and Aggregated) training and Approximate Data Deletion.
In response to the evolving landscape, OpenAI has recently taken a step towards privacy protection by accepting data removal requests. However, the researchers stress that more comprehensive measures are required to ensure privacy rights are safeguarded amid rapidly advancing technology.
Zhang underscored the importance of preserving privacy as a fundamental human right, asserting that "the principle of privacy should not be changed, and people's rights should not be compromised due to technological advancements."
The findings of the research team were recently published in arXiv.