What obligations apply to providers of GPAI models?

Article 53 requires providers of General Purpose AI (GPAI) models to draw up technical documentation, provide information to downstream providers, implement a copyright policy, and publish a summary of training data.

How do I determine if my AI system is high-risk?

Article 6 defines two categories of high-risk AI: (1) AI systems used as safety components in products under EU harmonisation legislation (Annex I), and (2) AI systems in specific application areas such as biometrics, critical infrastructure, education, employment and law enforcement (Annex III).

Reddit vs Dutch DPA: AI training data ruling

Update 4 March 2026: The District Court of The Hague rejected all of Reddit's claims. The AP investigation into the legality of selling user data to AI trainers continues unimpeded.

On 4 March 2026, the District Court of The Hague (Rechtbank Den Haag) delivered its verdict in a summary proceedings case that Reddit had brought against the Dutch Data Protection Authority (Autoriteit Persoonsgegevens, AP). The result was unambiguous: all of Reddit's claims were rejected. The investigation continues. And for anyone working in the AI industry with large volumes of text data, that is a signal that demands serious attention.

How This Started

Over the years, millions of people have contributed to Reddit. They asked questions, shared experiences, wrote stories, and debated everything imaginable. All of that text collectively forms one of the richest corpora of human expression on the internet - and in 2023, Reddit decided to monetize it. The platform announced it would restrict and charge for API access, targeting companies that wanted to use the data to train large language models.

The Dutch DPA opened an investigation into this practice in 2025. The central question: can platforms like Reddit sell user-generated content to AI companies without the explicit consent of the users who created it?

Reddit's European headquarters is in the Netherlands, making the AP the competent supervisory authority.

The Refusal That Led to Court

An investigation runs smoothly as long as the investigated party cooperates. Reddit did not. The company refused to give the AP access to its internal systems - Jira for project management, Google Vault for archived communications, Ironclad for contract management, and so-called SWAT tables containing internal data analyses. Reddit's justification was attorney-client privilege: the information, it argued, was protected by the confidential lawyer-client relationship.

The AP fundamentally disagreed and imposed a "last onder dwangsom" - a compliance order backed by financial penalties. Reddit chose the courts instead.

What the Court Decided

The District Court of The Hague was clear. All of Reddit's claims were dismissed. The judge found no basis to prohibit the AP from continuing its investigation, nor to suspend the compliance order.

The judgment (ECLI:NL:RBDHA:2026:4248) is procedural in nature: it does not establish that Reddit violated GDPR, but it removes all formal obstacles for the AP to now make that determination itself. Reddit tried its legal blockade and lost.

The Legal Core: Is This Actually Allowed?

The question the AP investigation must ultimately answer touches the entire AI industry. When a platform sells user data to AI developers for training language models, is that lawful under GDPR?

Processing personal data requires a legal basis. Reddit will likely invoke legitimate interest (Article 6(1)(f) GDPR). But that basis demands a balancing test: the controller's interest must be weighed against the interests, rights, and freedoms of the data subjects. And that is where things get complicated.

Someone who posted a question in a cooking subreddit, or shared a personal story in a mental health community, could not reasonably have expected that contribution to be used years later as training material for commercially deployed AI systems. The user's reasonable expectation at the time of posting is directly relevant to any legitimate interest assessment.

Then there is the purpose limitation principle. Data collected to facilitate online discussion cannot simply be repurposed for fundamentally different ends without a fresh legal basis. Making that data available to AI companies for model training is a materially different purpose from hosting community conversations.

And finally, there are the rights of the data subjects themselves. Access, objection, erasure - these rights exist on paper but are practically impossible to exercise once data has been transferred to a third party for training. Once baked into a model, individual contributions cannot realistically be extracted.

AI Act Article 53: Training Data Transparency

Beyond GDPR, the EU AI Act is also relevant here, particularly for general-purpose AI (GPAI) models. Article 53 of the AI Act requires providers of GPAI models to provide transparency about the data used for training, including a publicly available summary of the training datasets.

This creates a compelling chain of accountability. If an AI company purchased data from Reddit, it must be able to account for that in its training data documentation. And Reddit as the data vendor must be able to demonstrate that the sale was lawful. If the AP finds that it was not, both the data seller and the AI developer have a problem.

That mechanism is precisely what makes this investigation so significant beyond Reddit itself. It applies to any platform monetizing its data assets through API deals with AI companies, and to any AI provider building models on that data.

2 minutes, zero commitment

Learn the EU AI Act by doing

No slides. No boring e-learning. Try an interactive module.

Interactive ChallengePowered by

LearnWize

Take the full test on LearnWize

Try it yourself

3 interactive activities. Earn XP. See why this works better than reading slides.

Take the full test on LearnWize

Flashcards→Matching→Audit

Implications for RAG Pipelines and API Data

Many organizations today are building RAG systems (Retrieval-Augmented Generation) using external data sources. They scrape publicly available content, purchase API access from social platforms, or use third-party datasets. This investigation raises a pointed question that is asked too rarely: is that data actually clean from a legal standpoint?

"Publicly available" is not the same as "free to use for any purpose." Reddit posts are publicly visible, but the people who wrote them are identifiable, have fundamental rights, and made their contributions in a specific context. Using that data for commercial AI training breaks that context.

Organizations using external data in AI applications would do well to seriously consider four questions. First: on what legal basis was that data originally collected, and is that basis documented? Second: is there a valid purpose limitation argument - does the new use fall within the reasonable expectations of the data subjects? Third: does the agreement with the data provider include warranties about the lawfulness of the data? And fourth: were data subjects adequately informed and given a realistic opportunity to object?

If the AP's investigation concludes that Reddit's data practices were unlawful, the AI models trained on that data become legally less solid. That has consequences for every organization deploying those models.

The Broader Signal

The Reddit vs AP case is not an isolated incident. It is a symptom of a structural tension that will occupy the AI industry for years to come. The need for large, diverse training datasets and the protection of the fundamental rights of European citizens are in direct conflict here.

The Dutch DPA has previously warned about unlawful use of personal data in AI contexts. The case involving LinkedIn and the AP's warning, and the warning about security risks of AI agents, demonstrate that the regulator is willing to take enforcement action even against large tech companies with deep pockets and well-resourced legal teams.

The summary proceedings make one thing clear: the AP is not going to be deflected by attorney-client privilege arguments. The court sided with the regulator. And if the AP ultimately finds that Reddit violated GDPR, that is a precedent that fundamentally undermines the business model of "we sell our API data to AI companies."

Practical Takeaways

For organizations building AI systems or RAG pipelines on data sourced from social platforms and public web content, this case is a call to audit data provenance now rather than after enforcement. The questions to ask are straightforward even if the answers are not.

Where did your training data come from? Does the platform or provider from whom you obtained it have a credible legal basis for making that data available? Did the original users have any reasonable expectation that their content would be used in this way? Are there contractual warranties in your data supply agreements?

If you cannot answer these questions with confidence, that is a gap that regulators will eventually notice. The AP has shown it is willing to pursue these questions all the way through court proceedings. Other EU data protection authorities are watching.

Conclusion

The 4 March ruling is procedural, but the stakes are far larger. The AP is investigating whether selling user data to AI trainers is lawful under GDPR. Article 53 of the AI Act adds transparency obligations that tighten the chain of accountability further.

For organizations using external data in AI applications - whether through scraping, API deals, or data packages - this is the moment to take the provenance and legality of that data seriously. "Publicly available" is not a blanket authorization. The regulator has shown it is willing to enforce that position, and the court has confirmed it has the right to do so.

Sources

Autoriteit Persoonsgegevens (Dutch DPA): Court rules in summary proceedings Reddit against AP (4 March 2026)

District Court of The Hague: ECLI:NL:RBDHA:2026:4248 - Judgment summary proceedings Reddit vs Autoriteit Persoonsgegevens (4 March 2026)

IT en Recht: Reddit's claims against Dutch DPA on attorney-client privilege dismissed (2026)