Swiss researchers from ETH Zurich have sounded the alarm about the potential privacy risks posed by AI chatbots, highlighting their ability to obtain personal information from seemingly harmless messages or based on what the users type online.
In what is being hailed as the first comprehensive study of its kind, the team revealed that large language models (LLMs) exhibit a remarkable capacity to infer a wide spectrum of personal attributes, including gender, income, and location, all from text sourced from social media platforms.
Unsettling Proficiency in Inferring Private Data
According to Science X Network, Robin Staab, a doctoral student at the Secure, Reliable, and Intelligent Systems Lab at ETH Zurich, emphasized the sheer scale at which LLMs can deduce personal data, surpassing prior capabilities.
Staab noted that despite developers' best efforts to safeguard user privacy and uphold ethical standards, LLMs, trained on extensive volumes of unprotected online data, exhibit an unsettling proficiency in deducing private details.
Staab pointed out the potential risks involved, noting that with just a handful of attributes such as location, gender, and birth date, nearly half of the US population could be identified.
Identification becomes plausible by cross-referencing scraped media site data with publicly available records like voting histories. This information could be exploited by political campaigns and advertisers or even pose risks from criminal elements or stalkers, according to the study.
The researchers illustrated their point with an example of a Reddit user discussing their daily commute. The chatbot swiftly inferred the user's likely location in Melbourne based on the description of a specific intersection.
Furthermore, by analyzing other comments, the chatbot deduced the user's sex and likely age, which showcased the depth of information accessible through seemingly casual exchanges.
Additionally, chatbots are adept at detecting linguistic traits that divulge a person's background or location. Distinctive regional slang or phrasing can serve as identifiers.
For instance, the use of phrases like "Mate, you wouldn't believe it, I was up to me elbows in garden mulch today" led the chatbot to conclude the user was likely from Great Britain, Austria, or New Zealand.
Read Also : AI Researchers Claim Finding Critical Vulnerabilities in Large Language Models Like ChatGPT
LLM Privacy Implications
The researchers expressed significant concern about the potential for malicious chatbots to steer users into making revealing comments during seemingly benign conversations.
They highlighted that chatbot inferences offer a level of surveillance previously only achievable through expensive human profiling. This raises substantial privacy implications and calls for a reevaluation of chatbot deployment and safeguards.
"As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference," the study's abstract reads.
"Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection," it added.
The findings of the team were published in arXiv.
Related Article : Can ChatGPT, Other Large Language Models Flag Fake News?