Wikimedia Complains About AI Bots Scraping as It Strains Servers, Causing Bandwidth to Surge by 50%

AI scraping is getting worse and Wikimedia is now feeling its effects.

RICCARDO MILANI/Hans Lucas/AFP via Getty Images

Back when artificial intelligence was on the rise, AI scraping has been a massive problem as they were unlicensed and did not ask for the right permissions to access data from web sources, and that same problem is what the Wikimedia Foundation is facing now.

The non-profit organization is now complaining about the notorious case of AI scraping on its website that is now causing a massive strain on their servers.

Wikimedia Complains About AI Bots Scraping, Straining Its Servers

The Wikimedia Foundation shared a post which details how the massive AI scraping activity is now having a negative impact on its operations across the different websites they have available. According to the organization, AI bots that are scraping data from their platforms have already caused a significant strain to their servers, and while many organic users still head to their websites for information, bots make up the majority.

"But with the rise of AI, the dynamic is changing: We are observing a significant increase in request volume, with most of this traffic being driven by scraping bots collecting training data for large language models (LLMs) and other use cases," said the Foundation.

Overall, Wikimedia claimed that since January 2024, its bandwidth for downloading content surged by 50%. AI bots that are scraping from their websites have been consuming terabytes of data, according to Ars Technica.

The Massive Effects of Unlicensed AI Scraping

There had been concerns about AI companies going to specific platforms, websites, and their backends in order to gather data and information that they will use for training their models. OpenAI is among the most notorious ones as they faced massive lawsuits from different plaintiffs including book authors, news and publication websites, content creators, tech companies, and more for AI scraping.

However, Sam Altman and OpenAI are not the only ones alleged to have been scraping data from the web and doing so without license or permission as it has been a notorious practice since the era of generative AI began.

Copyright infringement is one of the most significant effects of unauthorized AI scraping, but it has also been a matter of privacy, especially with platforms like Meta, which harvest data from their users on social media for their AI models.

Other companies have looked to leverage their massive data to profit from AI scrapers, partnering with AI firms to license their data like what Reddit introduced last year, with Google as one of its biggest clients.

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Tags:Wikimedia
Join the Discussion