OpenAI Caught Using Reddit to Train AI—Is Your Opinion Being Tested?

OpenAI asks its AI models to write replies that would change the mind of a Reddit user about a certain topic.

OpenAI is using Reddit's r/ChangeMyView to test the persuasiveness of its AI models, and it raises questions about data sourcing, AI influence, and ethics. This is not about an AI conversational feature but rather an experiment on their end.

If you're wondering how it's done, OpenAI simply collects data on the subreddit. From here, an AI model will compose replies to a real human—and this is where the power of persuasion enters.

OpenAI Uses Subreddit to Evaluate AI Persuasion

OpenAI has revealed that it uses the popular Reddit forum r/ChangeMyView as a benchmark for testing how effectively its AI models can persuade human users. This was disclosed in a system card released alongside its latest reasoning model, o3-mini.

The subreddit, where users post strong opinions and invite others to challenge them with counterarguments, provides a rich dataset of human reasoning.

OpenAI's method involves collecting these discussions, having its AI generate responses in a closed testing environment, and then comparing the AI-generated arguments to human replies.

Testers assess how convincing the AI responses are, helping OpenAI improve its models.

AI Training on Human Discussions—A Goldmine for Tech Companies

According to TechCrunch, Reddit has become a treasure trove for AI training, offering vast amounts of high-quality, user-generated content. OpenAI- along with the tech giants-have been tapping into these resources to enhance the reasoning and communicative capabilities of AI models.

While OpenAI has a content-licensing deal with Reddit—similar to the $60 million agreement reportedly signed by Google—the company says that its ChangeMyView evaluation is separate from that deal. However, it's unclear how OpenAI accessed the data from the subreddit, and it has no plans to make this evaluation public.

Reddit's Complicated Relationship with AI Companies

Reddit has made it clear that it does not tolerate AI companies scraping its content without permission. The company's CEO, Steve Huffman, had earlier criticized the likes of Microsoft, Anthropic, and Perplexity for bypassing negotiations, calling the blocking of unauthorized scrapers a "real pain in the ass."

This includes OpenAI, which has faced multiple lawsuits, including one from The New York Times, for allegedly improperly scraping web content to train its AI models.

How Well Does OpenAI's AI Perform on Persuasion?

The ChangeMyView benchmark was also used to evaluate OpenAI's previous model, o1. While o3-mini doesn't show a significant boost in performance compared to o1 or GPT-4o, OpenAI's latest models are proving to be more persuasive than the majority of human users on the subreddit.

However, OpenAI argues that it is not after hyper-persuasive AI but rather after ensuring that AI does not become dangerously convincing. The worry is that highly persuasive AI may manipulate users, possibly pursuing its own objectives or serving the interests of whoever controls it.

AI Persuasion—Powerful Yet Risky

The better the models are at mimicking human reasoning, the greater the ethical concerns. Consistent changes in human opinion can make an AI dangerous for the influence of political views and misinformation as well as the manipulation of decisions.

OpenAI has introduced safeguards whereby their AI is not overly persuasive or deceptive; however, the company admits that balancing reasonability in AI with concerns has yet to be solved.

The Hunt for Quality Training Data Continues

Despite scraping vast amounts of internet content and striking licensing deals, AI developers still struggle to find high-quality datasets to train their models. OpenAI's use of r/ChangeMyView might be the start of something bigger—overtaking human opinions with human-like, AI-generated replies.

AI will be unstoppable and that's for sure since its evolution is progressive. However, AI companies should always be transparent when it comes to AI data training since the risks weigh more on the users.

The potential of AI to persuade humans is indeed impressive but the implications call for cautious oversight.

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion
Real Time Analytics