Agentic AI Is Here. Your Compliance Framework Probably Isn't Ready.

Most AI governance frameworks were designed with a simple model in mind: a human asks a question, the AI responds. One request, one response, one point of data flow to monitor. That model is increasingly obsolete.

Agentic AI systems, including Claude Code, OpenAI's Codex CLI, LangChain-based pipelines, and custom agent frameworks, operate differently. They receive a high-level goal, break it into sub-tasks, call external tools, make dozens or hundreds of API requests, and produce outcomes that no human explicitly reviewed at each step. The gap between what was instructed and what was executed can be enormous.

This is a compliance problem. And most organisations are not ready for it.

What makes agents different from simple AI use

Consider the difference between a user asking ChatGPT to summarise a document and a LangChain agent told to "research our top three competitors and draft a market positioning report."

The second instruction triggers the agent to:

Search the web and retrieve content from external sites
Potentially access internal tools (CRM, databases, file systems) via function calls
Generate intermediate reasoning and sub-queries across multiple model calls
Consolidate and synthesise across all retrieved content
Produce an output that reflects all of the above

At no point in that chain did a human review what data was pulled, what was sent to the model, or what intermediate outputs were generated. The only human touchpoint was the initial instruction and the final output.

From a data protection and compliance perspective, every step in that chain is a potential point of failure. Sensitive data could enter at any step. Personal data could be processed without a legal basis. Confidential information could be retrieved and embedded in a response destined for an external system.

Why the EU AI Act cares about autonomy

The EU AI Act's risk classification framework pays particular attention to autonomous decision-making. The higher the degree of autonomy in an AI system, the more scrutiny it receives. Systems that make decisions affecting individuals without meaningful human review are, in many cases, pushed toward higher risk categories.

Agentic systems sit squarely in the zone the Act is most concerned about. Under Art. 26, deployers are required to implement appropriate technical and organisational measures, monitor AI system operation, and maintain logs sufficient to reconstruct the system's operation. For a single-turn chatbot, this is manageable. For an agent executing hundreds of steps autonomously, it is a fundamentally different challenge.

Art. 26(5) specifically addresses human oversight. If your AI system can take consequential actions without a human reviewing each step, you need to demonstrate that meaningful oversight is still in place at the architecture level. Spot-checking final outputs is not the same as monitoring operation.

What traditional monitoring misses

Most current AI monitoring approaches are built around conversation-level visibility. They capture what the user sent and what the model returned. This is the minimum viable logging for simple chat use cases.

For agents, this approach misses:

Intermediate model calls generated by the agent itself, not the user
Tool calls and their results, including data retrieved from databases, APIs, and file systems
Chain-of-thought reasoning that the model uses to plan actions
External data ingested mid-chain, which may include personal or sensitive data
Decisions made at each step that collectively produce the final outcome

A compliance log that only captures the initial user message and the final agent response is, for audit purposes, nearly useless. You cannot reconstruct what happened, demonstrate policy compliance, or identify where a data incident originated.

The tool-use problem

Modern AI agents are built with tool use as a core capability. An agent connected to tools can read from your CRM, write to your database, send emails, query APIs, and execute code. Each tool call is a data flow event. Each data flow event is a potential compliance event.

If an agent reads a customer record to answer a user's question, that is personal data processing under GDPR. If it then includes details from that record in a response that goes to a third-party system, that may be a data transfer. If the model received the record as context and retained it in its context window across subsequent calls, the retention question becomes complicated.

None of this is visible to a monitoring system that only captures user-to-model exchanges. Tool calls happen inside the agent loop, between model calls, and they are invisible to an observer who is only watching the API endpoint.

Framework-agnostic monitoring at the proxy layer

The most robust approach to agentic compliance monitoring is interception at the API layer rather than within any specific framework. Because essentially all LLM agents, regardless of the framework they use, communicate with AI providers via HTTP API calls, a proxy positioned between your infrastructure and the AI provider captures every request in the chain.

This is how Acta is designed to work. Every API call from an agent passes through the Acta proxy. This means:

Full chain logging, including intermediate model calls generated by the agent, not just the initial user request
Policy enforcement on every request in the chain, so sensitive data is blocked at any step, not just the first one
Tool call visibility where the agent sends tool results back to the model as context, those results are captured
Framework independence, because the proxy works at the network level, it captures LangChain agents, Claude Code, Codex CLI, custom Python pipelines, and any other framework that calls an LLM API

The result is a complete audit trail that can reconstruct the agent's operation step by step, which is what Art. 26 compliance genuinely requires.

What good agentic logging looks like

For each agent run, a compliant log should be able to answer:

What was the initial instruction or trigger?
What model calls were made, in what sequence, with what content?
What external data was retrieved and included as context?
Were any policy violations detected, and what action was taken?
What was the final output, and where did it go?
Which user or system account initiated the run?

If your current logging cannot answer these questions for every agent execution, your compliance posture for agentic AI has significant gaps.

Starting points for teams deploying agents today

Inventory your agent deployments. Include internal tools, developer environments like Claude Code or Codex CLI, and any third-party integrations that use LLM APIs autonomously.
Map the data flows. For each agent, identify what data sources it can access and what external systems it can write to.
Implement proxy-level logging so that every API call in the chain is captured, not just user-initiated requests.
Apply policy enforcement at every step, not just at the entry point. An agent can introduce sensitive data mid-chain that was not present in the original instruction.
Review your DPIA to address agentic AI specifically. If it only covers chatbot use cases, it does not cover your current risk surface.

Disclaimer: This article is for informational purposes and does not constitute legal advice. EU AI Act risk classification and obligations depend on specific system characteristics and use context. Consult qualified legal counsel for guidance specific to your organisation.