What AI Providers Actually Do With Your Data: Retention, Training, and Opt-Outs

When an employee pastes a customer contract into an AI tool, that data does not disappear when the conversation ends. Every major AI provider retains some portion of that data, for some period of time, for some set of purposes. The details vary, change over time, and are often buried in terms of service that nobody reads before use.

This article breaks down what the major providers actually retain, what opt-outs exist, and why opt-outs are a weaker protection than most DPOs assume.

OpenAI: different rules for API vs. consumer

OpenAI operates two distinct data regimes, and the distinction matters enormously.

For API customers (organisations calling GPT-4 or other models via the API), OpenAI's default policy is not to use inputs and outputs to train models. API data is retained for up to 30 days for abuse and safety monitoring, then deleted. Enterprise agreements can reduce or eliminate this retention.

For consumer products (ChatGPT free, ChatGPT Plus), the defaults are the opposite. Conversations may be used to improve models unless the user explicitly opts out in account settings. Retention periods are longer. And critically: employees using personal ChatGPT accounts through a browser are on the consumer terms, not the API terms. Your enterprise OpenAI agreement covers nothing about what happens when staff use ChatGPT on their lunch break.

Anthropic: similar API/consumer split

Anthropic's Claude follows a comparable pattern. Via the API (used by enterprise customers and developers), Anthropic retains conversations for a limited period for trust and safety purposes, with the ability to negotiate shorter retention in enterprise agreements.

On Claude.ai, the consumer product, users can opt out of model training via account settings. However, Anthropic reserves the right to retain data for safety monitoring even when training opt-out is selected. The opt-out removes your data from model training pipelines. It does not remove it from retention entirely.

Google: Gemini and Workspace

Google's picture is more complex because Gemini spans both consumer (Gemini.google.com) and enterprise (Gemini for Google Workspace) contexts.

For Workspace Enterprise customers with Gemini enabled, Google's data processing terms apply. Customer data is generally not used to train Google's foundational models when Workspace data protection terms are in place. Data retention follows the Workspace configuration the administrator sets.

For the consumer Gemini app, reviewers may read conversations to improve the product. Conversations are stored for a default period that users can adjust. As with OpenAI, the risk is employees using personal Google accounts rather than corporate Workspace accounts.

Microsoft Copilot: enterprise protection, consumer exposure

Microsoft's approach is one of the more enterprise-friendly in the market. Microsoft 365 Copilot, used through a licensed Workspace or M365 environment, processes data within the customer's Microsoft 365 boundary. Microsoft commits not to use this data to train foundational models. Retention follows whatever policies the customer has set in their Microsoft environment.

The problem is Copilot.microsoft.com and the Bing-integrated versions, which operate on consumer terms. Employees accessing Copilot outside of a licensed Microsoft environment may be subject to Microsoft's consumer data practices rather than the enterprise commitment.

The opt-out complexity problem

Across all providers, opt-outs share a set of common limitations:

They require action to activate. Defaults are often set in the provider's favour. Opting out requires someone to find the setting, understand it, and toggle it before any data has been sent.
They cover training, not retention. Opting out of model training rarely means zero retention. Abuse and safety monitoring creates a floor below which retention cannot go, regardless of your settings.
They apply per account, not per organisation. An opt-out on your enterprise account does nothing for the employee using a personal account.
They change. Providers update their terms. An opt-out configuration that worked in 2024 may have a different scope in 2026. Tracking these changes across multiple providers is operationally difficult.

Why opt-outs are a last line of defence, not a first

The fundamental issue with relying on provider opt-outs is that data has already left your environment by the time the opt-out applies. The data reached the provider's servers. It was processed. It was potentially retained for some period. The opt-out prevents it from entering a training dataset. That is a narrow and downstream protection.

For sensitive data, including personal data under GDPR, special category data under Art. 9, or confidential business information, preventing transmission in the first place is a substantially stronger control than relying on provider retention settings.

Where Acta fits

Acta is designed to work as a proxy between your users and AI providers. When sensitive data is detected in a prompt, it can be blocked before the request is sent. If data never reaches the provider, none of the retention policy variations above apply. There is nothing to opt out of because nothing was transmitted.

For requests that do go through, Acta does not retain your prompt content on its servers. Audit logs are written to your own infrastructure, giving you full control over your organisation's data trail without relying on any provider's retention settings to behave as expected.

Opt-outs are worth having. Configuring them correctly across all your enterprise accounts is worth doing. But they work best as a second layer behind technical controls that stop sensitive data from reaching the network in the first place.

Practical steps for DPOs and CISOs

Audit which AI tools your organisation actually uses, including consumer-tier tools accessed through personal accounts.
Review provider terms for each tool, paying specific attention to the consumer vs. enterprise distinction.
Configure opt-outs on all enterprise accounts and document that configuration as part of your DPIA records.
Implement technical controls that scan prompts for sensitive data before transmission, rather than relying solely on post-transmission opt-outs.
Set a calendar reminder to re-review provider terms annually, since policies change and your current configuration may not reflect current defaults.

Disclaimer: This article is for informational purposes and does not constitute legal advice. Provider data retention policies are subject to change. Verify current terms directly with each provider and consult qualified legal counsel for guidance specific to your organisation.