The Financial Times and OpenAI strike deal over training data

by Gabin Jobs April 29, 2024

written by Gabin Jobs April 29, 2024

The FT is granting OpenAI access to its news archives as generative AI companies continue to secure private data sources.

Part of the arrangement involves ChatGPT providing summaries, direct quotes, and hyperlinks to full articles published by the FT, linking directly back to the original content on the website.

As part of the arrangement, OpenAI has committed to collaborating with the FT to develop new AI-driven products. The FT has experimented with AI before, incorporating Anthropic‘s Claude into a generative search tool called “Ask FT.”

John Ridding, FT CEO, stated of the deal, “Apart from the benefits to the FT, there are broader implications for the industry. It’s right, of course, that AI platforms pay publishers for the use of their material. OpenAI understands the importance of transparency, attribution, and compensation – all essential for us.”

A reasonable sentiment – though some would disagree that OpenAI understands the importance of transparency and attribution.

Brad Lightcap, COO of OpenAI, also chimed in: “Our partnership and ongoing dialogue with the Financial Times is about finding creative and productive ways for AI to empower news organizations and journalists, and enrich the ChatGPT experience with real-time, world-class journalism for millions of people around the world.”

While access to the FT’s data is valuable for OpenAI, its current datasets consist of trillions of words of dubiously ‘public’ or ‘open source’ data. The deals they’ve been keen to strike with the FT and other media companies like Axel Springer are a drop in the ocean.

While AI companies acknowledge they need to start paying for data to align with growing legal pressures, they’ve also realized that their models will quickly become outdated without striking deals with established media outlets.

The sustainability of current AI data practices is questionable. OpenAI‘s CEO Sam Altman has noted the finite nature of the available data resources, which could run out by 2026.

This imminent scarcity has prompted discussions about synthetic data and other alternative data generation methods, which carry their own risks and challenges.

AI companies battle over data

The ethical stakes in AI data usage are immense. In their quest for comprehensive datasets, tech giants like OpenAI, Google, and Meta have been reported to engage in practices that push or outright cross legal and ethical boundaries.

For instance, a New York Times investigation revealed that OpenAI developed a tool named Whisper to transcribe YouTube videos – despite potential violations of YouTube’s policies against using its videos for independent applications.

Similarly, Google and Meta have explored or implemented strategies that skirt or reinterpret existing copyright and privacy laws to gather more data.

Among the shadier strategies are altering privacy policies to allow AI applications to use publicly available content from platforms like Google Docs.

So, while AI companies are willing to pay for data now, that doesn’t spare them bending the rules elsewhere.

Source Link

The Financial Times and OpenAI strike deal over training data

AI companies battle over data

Copilot Workspace is GitHub’s take on AI-powered software engineering

Notable absences hit the AI Safety Summit taking place in May

Related Posts

Leave a Comment Cancel Reply