Artificial intelligence (AI) chatbots like ChatGPT may not be so smart in the near future due to the potential depletion of the online data that fuels their learning processes.

A recent study conducted by Epoch AI suggests that the available pool of publicly accessible training data for AI language models may be exhausted by the early 2030s, raising concerns about the sustainability of current AI advancements.

ITALY-TECHNOLOGY-AI
A photo taken on October 4, 2023 in Manta, near Turin, shows a smartphone and a laptop displaying the logo of the ChatGPT robot by OpenAI research laboratory.
(Photo : MARCO BERTORELLO/AFP via Getty Images)

Will AI Run Out of Data Soon?

According to Epoch AI's findings, the exponential growth in AI capabilities in the past years has been primarily driven by the scaling of models and the expansion of training datasets. 

However, there is a finite amount of high-quality human-generated text data available on the internet, which serves as the primary source for training AI language models like ChatGPT

The study estimates that the adequate stock of such data amounts to approximately 300 trillion tokens, with a forecasted depletion timeline ranging from 2026 to 2032.

The study highlights the critical role of training data in scaling AI models and emphasizes the need for a sustainable approach to AI development. It says that the current trajectory of AI progress may be unsustainable if companies solely rely on the limited pool of publicly available text data for training purposes.

Furthermore, the study discusses the implications of different scaling strategies on the timeline for data depletion. It says that overtraining AI models with fewer parameters and more data could speed up the depletion of available training data as early as 2025.

The study also acknowledges recent advancements in AI training methodologies, such as the use of carefully filtered web data and the ability to train models without significant degradation. 

However, the team warns that even with these advancements, the long-term sustainability of AI development remains uncertain. As the demand for AI capabilities grows, companies may face challenges in sourcing high-quality training data to fuel further advancements in AI technology.

Read Also: Disengaged Students Are More Likely to Use AI Tools Like ChatGPT for Assignments: Study

New Innovations Needed to Sustain AI Progress

In response to the looming challenge of data depletion, tech companies like OpenAI and Google are seeking alternative sources of training data, including partnerships with online platforms like Reddit and news media outlets.

However, these efforts may only offer short-term solutions, as the supply of new publicly available text data is finite and subject to depletion over time.

The study suggests that new innovations will be required to sustain AI progress beyond publicly available training data depletion. These could involve the development of synthetic data generation techniques, leveraging alternative data modalities, and improving data efficiency. 

While the future of AI development remains uncertain, the study anticipates continued investments in research and development to address these challenges and drive future advancements in AI technology. The research team's findings were published in arXiv. 

Related Article: EU Data Protection Task Force Examines OpenAI's ChatGPT Privacy Compliance, Reveals Preliminary Findings

Byline


ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion