Reddit Warns AI Companies, Scrapers in Accessing Its Data for Crawling

Reddit made it clear that it has rules for those accessing its data.

Among the many changes to Reddit is its API access, which was highlighted again after the company issued a warning against AI companies and scrapers for their unconsented access to the platform's data. Reddit is now enforcing an iron hand on those accessing its data, especially with what users contribute, without permission or dealing with the company.

Gone are the days when Reddit allowed anyone to access its API and site data and was known for previously licensing its content for AI training.

Reddit Warns AI Companies, Scrapers For Data Access

The latest update from Reddit is now issuing a warning to AI companies and scrapers who are accessing their data and information without proper permission or consent. The company claimed that this warning is heavily rooted in the recent Public Content Policy, which it shared several weeks ago, claiming that it does not allow third-party crawlers access to Reddit's data.

Reddit
Mario Tama/Getty Images

The company added that it would soon update its Robots Exclusion Protocol, a.k.a. its robots.txt files, especially given the massive access some parties have despite not having agreements with Reddit.

According to Reddit, it will block or limit access for those found to be accessing its content and data, particularly if it does not have any agreements or collaborations with the company.

Reddit's Third-Party Crawling Access

Reddit does not allow third-party crawling access to its data unless it has consent or permission from the company, and it will block those it will catch soon.

Because of their agreements, some Reddit partners and collaborators, such as The Internet Archive, are already able to do so, and the upcoming change will not impact their likes.

Reddit's AI Training Licensing

Last February, reports sparked a massive controversy over Reddit, with claims that the company is selling its platform's data, including the human-made ones, to different AI companies for model training. This centers on a $60 million per year licensing deal, which was later confirmed to be with Google, legitimizing these deals with those who want to train using its content.

This was also a crucial time in Reddit's operations, as the company was on the verge of its first IPO. After the confirmation of the AI licensing deal, the company filed its IPO.

The many changes on Reddit were also attributed to its plans to go public, including stricter API access, which requires massive payments, and content licensing for AI.

Fast-forward to recent news: Reddit is now enforcing stricter rules on its platform's content, especially as there have been reports and testimonies about accessing its data without the proper permission of the company. The company now stresses its Public Content Policy to all wishing to use its data and its Robots.txt files, with violators facing a block once found.

Isaiah Richard
Tech Times
ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Tags:Reddit
Join the Discussion
Real Time Analytics