New York Times Bans AI Companies From Using Media Archives to Train Algorithms

The New York Times has taken a firm stance against AI companies utilizing its media archives to train algorithms, signaling a change in its terms of service (TOS) policy.

According to Gizmodo, effective August 3, the newspaper's updated TOS explicitly prohibits using its extensive media content for training "any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system."

The revised policy encompasses the entirety of the Times' content, comprising text, images, videos, and metadata. It explicitly prohibits companies from using web crawlers to access this data for the training of their proprietary products.

New York Times on Training AI Systems

Within the domain of artificial intelligence (AI), a contentious matter revolves around the acquisition of data for training AI systems. This procedure necessitates significant volumes of data and computational resources.

Companies such as OpenAI, which have accumulated data through scraping the internet (both open and restricted sources), have encountered legal complexities. Notably, OpenAI is currently entangled in legal battles centered on the allegations of data theft and monetization.

Despite these legal complications, several prominent AI vendors have demonstrated their dedication to web scraping. For instance, Google recently declared its intent to continue web scraping unless constrained by legal mandates.

The New York Times' choice to limit unrestricted access to its media archive for such purposes underscores its acknowledgment of the value of its data and its hesitance to provide it free of charge, which could potentially lead to legal conflicts.

The New York Times' modification of this policy aligns with the developing connection between the news media and the rapidly expanding AI industry.

In fact, several news outfits such as Agence France-Presse (AFP), The Associated Press, and other journalist organizations have already penned an open letter highlighting issues on intellectual property rights and the potential spread of misinformation through AI.

On the other hand, Reporters Without Borders and several partners are set to draft a comprehensive charter regulating AI use in media to ensure its ethical and responsible practices.

AI Firms Collaborate With News Companies

AI companies have also pursued collaborations with newspapers and media organizations, intending to standardize the incorporation of AI tools into news curation and content creation.

Considering the significant potential market for AI in digital media, these firms are exerting efforts to establish their presence within the news sector by offering complimentary services and engaging in cooperative initiatives.

Notably, Google approached reputable news entities, including the Times and the Washington Post, introducing an AI tool named "Genesis" to support journalists. AI enterprises are also exploring alternative means of acquiring data due to the contentious nature of unrestricted web scraping.

In their quest for legally compliant methods, these companies are collaborating with news organizations, furnishing free automation services in exchange for access to the extensive text archives of newspapers.

The Associated Press recently entered into an agreement with OpenAI, granting the company access to the AP's text archives in return for OpenAI's technological expertise and products.

This strategic maneuver illustrates AI companies' adjustment to evolving data acquisition practices and their endeavors to foster relationships with news media establishments.