The Story of SocialConnect
When a platform decides to use 20 years of user data for AI β and the regulator calls
Fictional scenario β based on realistic situations
The Trigger
How it started
SocialConnect had an ambitious plan: improve their AI by training on all user data since the platform's founding. Posts, profiles, connections β everything would contribute to smarter recommendations. The default setting: opt-in for everyone, unless you explicitly object.
Millions of EU users. Decades of professional data. A regulator expressing "serious concerns." And a legal question occupying the entire tech industry: can you use old data for new AI purposes based on "legitimate interest"?
"You want to use data from 2010 for AI? While users back then signed up for something completely different?" The regulator was not impressed.
The Questions
What did they need to find out?
Is "legitimate interest" sufficient legal basis for AI training?
The legal team invoked Article 6(1)(f) GDPR β legitimate interest. The argument: AI training is necessary for business operations and improves user experience.
π‘ The insight
The regulator didn't buy it. Legitimate interest requires a careful balancing of interests: does the business interest outweigh the privacy impact? With millions of users and decades of data, that proportionality test is hard to win.
π Why this matters
Meta tried the same approach for Facebook and Instagram and encountered similar objections. The lesson: legitimate interest is not a free pass. The larger the scale and the more sensitive the data, the stronger your justification needs to be.
What about data collected years ago?
The team realized that users in 2010 signed up for "professional networking" β not for AI training. The purpose for which data was collected was fundamentally different from the new use.
π‘ The insight
This touched on the GDPR principle of purpose limitation. Data may only be used for the purpose for which it was collected, unless the new purpose is "compatible" with the original. AI training on all historical content? Hard to defend as "compatible" with networking.
π Why this matters
The DPA put it sharply: users shared information at the time "without foreseeing that it would be used for AI training." Claiming retroactive consent for new purposes is legally shaky ground.
What if you need consent from millions of people?
If legitimate interest doesn't hold up, consent is the logical alternative legal basis. But how do you ask millions of people for explicit opt-in for AI training?
π‘ The insight
The practical implications were daunting. Opt-in conversion rates typically range from 10-30%. That would mean 70-90% of training data falls away. For the AI team this was a nightmare β but for privacy experts exactly how it should be.
π Why this matters
This is the core tension between AI innovation and privacy: large AI models require large datasets, but obtaining real consent at scale is difficult. Companies that take this seriously invest in opt-in journeys that clearly communicate the value exchange.
Can we reverse anything once data is in the model?
A user submitted a request: "Remove my data from your AI model." The technical team had to explain why that wasn't so simple.
π‘ The insight
AI models are not databases. You can't just "delete" a specific data point from a trained model. Machine unlearning exists but is experimental and resource-intensive. This irreversibility makes thinking carefully beforehand essential.
π Why this matters
The DPA explicitly mentioned this as a concern: "Unlike traditional databases, it is practically impossible to remove specific information from trained models." This argument strengthens the case for consent beforehand rather than objection afterward.
The Journey
Step by step to compliance
The announcement
SocialConnect announced AI training would begin. Default: opt-in for everyone. Deadline: in 6 weeks.
The warning
The national regulator publicly raised the alarm. "Serious concerns" about the approach. Users were urged to disable their settings.
Legal reconsideration
The legal team had to re-evaluate the legal basis. Was legitimate interest sustainable? What alternatives were there?
Opt-in redesign
The UX department was tasked with designing an explicit opt-in flow that clearly communicated the value exchange.
Data scope limitation
To limit legal risks, the team decided to only use data from after a new privacy policy update, not historical data.
Transparent communication
An extensive FAQ and in-app explanation were published: what is used, for what, and how you can control it.
The Obstacles
What went wrong?
β Challenge
Legitimate interest proved legally unsustainable at this scale
β Solution
Switch to explicit opt-in with clear value proposition
β Challenge
Historical data could not simply be used for new purposes
β Solution
Scope limited to data collected after new privacy policy with AI training clause
β Challenge
Data cannot be removed from trained models
β Solution
No training before consent, so removal is not needed
We thought we were being smart with opt-out. The regulator showed us that smart is not the same as compliant. Now we're building trust instead of grabbing data.
The Lessons
What can we learn from this?
Legitimate interest is not a free pass
At large scale and with sensitive data, the proportionality test is strict. Regulators look critically.
Purpose limitation applies to AI too
Data collected for purpose A cannot just be used for purpose B. Especially not 15 years later.
Opt-out is not opt-in
Implied consent is not real consent. Especially not for significant new applications.
AI training is irreversible
You cannot remove data from a trained model. So ask consent beforehand, not afterward.
Is your organization considering AI training on user data?
Discover how to approach this compliantly under GDPR and AI Act.
Ga verder met leren
Ontdek gerelateerde content