OpenAI Transcription Tool Whisper Found to Generate Harmful, Violent Text Due to Hallucinations, Study Finds

Cornell University researchers have identified concerning behavior in OpenAI's speech-to-text transcriber Whisper, revealing its potential to hallucinate violent language.

The study sheds light on the system's tendency to generate entirely fictitious phrases and sentences, sometimes containing violent content and fabricated personal details, posing risks, particularly for individuals with speech impairments.

The researchers, led by Assistant Professor Allison Koenecke, unveiled their findings at the recent ACM Conference on Fairness, Accountability, and Transparency (FAccT).

OpenAI's Whisper Found to Hallucinate Violent Language

Their investigation discovered that OpenAI's Whisper, designed to transcribe audio data with remarkable accuracy, occasionally produces "hallucinations" in its transcriptions.

According to the research team, these hallucinations encompass invented phrases, personal information, and even fictitious websites, some of which could be exploited for malicious purposes.

Koenecke emphasized the potential ramifications of such hallucinations, especially if these transcriptions were used in critical contexts like AI-based hiring processes, legal proceedings, or medical records.

OpenAI introduced Whisper in 2022, having trained it on 680,000 hours of audio data. According to the company, Whisper boasts nearly human-level accuracy in transcribing audio data.

According to Koenecke, OpenAI has since enhanced the underlying model of Whisper following the completion of the study last year, resulting in a reduced rate of hallucinations.

Analyzing over 13,000 speech clips drawn from both individuals with aphasia and those without speech impairments, the researchers found that longer pauses and silences between words were more likely triggers for these hallucinations.

The research team identified that approximately 1% of Whisper's audio transcriptions featured entirely fabricated phrases, including mentions of both genuine and counterfeit websites, which could potentially be exploited in cyberattacks.

OpenAI — SEBASTIEN BOZON/AFP via Getty Images

Violent Hallucinations

The team cited one example: Whisper accurately transcribed a single, straightforward sentence but subsequently generated five additional sentences containing words such as "terror," "knife," and "killed," none of which were present in the original audio.

In other instances of fabricated transcriptions, Whisper generated random names, snippets of addresses, and irrelevant, sometimes entirely fake, website references. Furthermore, some transcriptions also included traces of YouTuber slang, such as "Thanks for watching" and "Electric Unicorn."

The researchers' thematic analysis of these hallucinated contents revealed that a significant portion included explicit harm, perpetuating violence, propagating inaccurate information, or asserting false authority.

They noted a disproportionate occurrence of hallucinations among individuals with aphasia, underscoring the potential biases amplified by such hallucinations in downstream applications of speech-to-text models.

While acknowledging Whisper's overall accuracy in transcribing audio data, the researchers urged industry practitioners to address these language-model-based hallucinations and shed light on the potential biases they introduce in speech-to-text applications.