Is Your Company's Data Training Someone Else's AI Model?

Imagine a customer support agent pastes a client complaint into a consumer AI tool to help draft a response. The complaint includes the client's name, account details, and a description of a medical device malfunction. The draft response comes back quickly. The agent sends it. Case closed.

Now imagine that conversation, along with millions like it, ends up in a training dataset used to improve the AI model's future performance. Your client's personal information, including data about a medical device, is now embedded in a commercial AI system you do not control.

This is not a hypothetical. It is the default behaviour of many consumer AI tools, and it continues to happen in organisations every day.

How model training on user data works

AI providers improve their models by exposing them to more data. For consumer products, conversations with users are a natural and valuable source of training data. The model learns from real-world inputs, including the kinds of questions people ask, the corrections they make, and the outputs they rate positively or negatively.

For the provider, this is a legitimate business interest. For your organisation, it is a data processing activity you may not have authorised, planned for, or even noticed.

Consumer AI tools typically include model training as an opt-out, not an opt-in. The default setting is "yes, use my conversations to improve the model," and users who want to prevent this must actively find and change a setting. In enterprise contexts, most providers offer stronger protections, but these require specific agreements, correct configuration, and the right tier of service.

Where the opt-out system breaks down

Even well-configured enterprise opt-outs have limitations that DPOs and legal teams often underestimate.

The consumer/enterprise gap. Your enterprise agreement covers AI access through the enterprise account. It covers nothing about what happens when an employee uses the consumer version of the same tool on a personal account. The same model, different terms. A significant portion of AI tool usage in most organisations falls into this gap.

Opt-outs apply after transmission. When you opt out of model training, you are instructing the provider not to include your data in future training pipelines. You are not preventing the data from being transmitted, received, and temporarily processed by the provider's systems. The data has already arrived by the time the opt-out policy applies.

Safety monitoring is not covered. Every major provider retains some data, even for fully opted-out enterprise customers, for trust and safety monitoring purposes. This is a contractual and operational floor that opt-outs cannot eliminate. If your organisation's data contains personal information, that information is still being processed for this purpose, regardless of your training opt-out status.

Terms change. Provider policies are updated, sometimes significantly. An opt-out configuration that reflected policy in one quarter may have different implications after a terms update. Without active monitoring of provider terms, your opt-out may not mean what you think it means.

The GDPR dimension

If data subject to GDPR enters an AI tool's training pipeline without a lawful basis for that specific processing activity, you have a compliance problem regardless of whether the training opt-out was technically available.

GDPR's accountability principle (Art. 5(2)) requires you to demonstrate compliance, not just assert it. If a regulator or data subject asks whether their data was used to train an AI model, "we had an opt-out configured" is a weaker answer than "the data was never transmitted to the provider in the first place."

For special category data under Art. 9, the stakes are higher still. Health data, genetic data, biometric identifiers, and other special categories require an explicit legal basis for processing. If special category data enters an AI training pipeline, the question of legal basis becomes immediately acute.

Rethinking the control model

The current approach in most organisations places opt-outs at the centre of the data protection strategy for AI use. This is understandable: opt-outs are easy to configure, visible to regulators, and something you can point to during an audit.

But they are a downstream control. They act on data that has already moved. The stronger control is upstream: preventing sensitive data from being transmitted to the provider in the first place.

If an AI prompt containing a patient record is blocked before it reaches the API, no opt-out is needed for that data. There is no training risk, no retention question, and no need to trust that the provider's opt-out system is correctly configured and operational. The data never left your environment.

How Acta supports this approach

Acta is designed to sit between your users and AI providers as a policy enforcement proxy. Before a prompt reaches the provider, Acta scans it for sensitive content, including personal data, special category data, and confidential patterns your organisation defines. Prompts that contain blocked content are stopped before transmission.

Data that is never sent cannot be retained, processed, or trained on. This is the most reliable protection against the model training risk, because it does not depend on a provider's opt-out system operating correctly.

For data that does flow through, Acta does not store prompt content on its own servers. Audit logs go to your own infrastructure, so your organisation retains full control of its data trail.

What to do this week

Check which AI tools employees are using, including consumer versions accessed via personal accounts.
Review your enterprise agreements for each AI provider to confirm the scope of model training opt-outs.
Configure opt-outs on all enterprise accounts and record the configuration as part of your processing activity records.
Implement prompt scanning so sensitive data is identified and blocked before it reaches any provider, making opt-outs a secondary layer rather than the primary one.
Update your Record of Processing Activities (RoPA) to reflect AI tool usage as a data processing activity, with accurate details of providers, purposes, and safeguards.

Disclaimer: This article is for informational purposes and does not constitute legal advice. Provider terms and opt-out mechanisms are subject to change. Consult qualified legal counsel and your DPO for guidance specific to your organisation and jurisdiction.