Compliance

What Is the EU AI Act?

The European Union's 2024 regulation classifying AI systems by risk tier and imposing governance, transparency, and oversight obligations on providers and deployers.

What Is the EU AI Act?

The EU AI Act is the European Union’s risk-tiered regulation for AI systems, adopted in 2024 with obligations phasing in through 2026 and 2027. It classifies systems into four tiers: prohibited (social scoring, real-time biometric ID in public spaces, manipulative or exploitative systems), high-risk (hiring, credit, education, critical infrastructure, medical devices, law enforcement, migration), limited-risk (chatbots, deepfakes. transparency duties only), and minimal-risk (most consumer applications). General-purpose AI (GPAI) models. including production LLMs from frontier labs and many fine-tuned derivatives. sit in a separate track with transparency, copyright, and systemic-risk duties. Penalties reach 7% of global annual turnover or €35M, whichever is higher. Engineering teams are responsible for the technical controls; legal owns the classification.

The 2026 status: prohibited-practice rules and the AI literacy requirement went live in February 2025. GPAI obligations and the AI Office governance structure went live in August 2025. The full high-risk regime applies from August 2026 onward, with conformity assessments, technical files, and post-market monitoring becoming auditable. As of May 2026, the practical question for engineering teams shipping into Europe is no longer “when does this apply?”. it is “do our traces, evaluator outputs, and audit logs survive a regulator’s request?” The technical answer depends on whether your team has wired datasets, regression evals, bias detection, and traceAI tracing into one queryable surface before launch.

Why the EU AI Act matters in production LLM and agent systems

If your LLM application makes or materially supports a decision in a high-risk domain. hiring, credit, healthcare, education, law enforcement, critical infrastructure, migration. you are inside the high-risk tier and you owe a documented set of controls before launch in the EU. The Act is specific. You need a risk-management process, a data-governance program covering training-data representativeness and bias, technical documentation, logging that supports post-market monitoring, transparency to deployers, human-oversight mechanisms, and demonstrable accuracy, robustness, and cybersecurity.

The pain shows up at audit. A bank’s hiring assistant cannot ship in Frankfurt because there is no documented bias evaluation across protected classes. A healthcare triage agent fails post-market monitoring because the team kept no per-decision audit log. A B2B chatbot deployed across the EU falls into the limited-risk transparency tier and is missing the “you are interacting with an AI” disclosure. A foundation-model fine-tune used in credit scoring inherits GPAI provider duties the deployer never planned for.

In 2026, the GPAI obligations are the part most engineering teams underestimate. If you fine-tune or substantially modify a foundation model such as GPT-5.1, Claude Opus 4.7, or Llama 4, you may inherit provider duties: model documentation, training-data summaries, copyright policy, and. for systemic-risk models above a 10^25 FLOP compute threshold. adversarial testing, incident reporting, cybersecurity baselines, and energy-efficiency disclosures. Multi-step agent systems amplify the surface area; every tool boundary, every MCP server call, every A2A handoff is a place a regulator can ask “what controls fired here, and where is the log?”

Risk tier mapping in 2026

The classification is the first decision and it is legal, not engineering. The Act draws the lines on function and context, not user count or revenue.

TierExamplesCore obligationsCommon engineering gap
ProhibitedSocial scoring, real-time biometric ID in public spaces, emotion recognition at work / schoolBan; cannot shipBuilding features that drift into a prohibited use
High-riskHiring, credit, education grading, medical devices, migration, critical infrastructure, law enforcementFull risk-management, data governance, transparency, human oversight, accuracy, robustness, cybersecurity, post-market monitoringNo audit log, no bias evaluator, no human-oversight workflow
Limited-riskChatbots, generated content, deepfakes, emotion recognition outside high-risk contextsDisclosure (“you are talking to AI”), watermarking generated mediaMissing AI-disclosure UX
Minimal-riskSpam filters, recommendation systems, most consumer chatVoluntary codes of conductSlipping into limited-risk via feature creep
GPAIFrontier LLMs (GPT-5.1, Claude Opus 4.7, Gemini 3, Llama 4) and significant fine-tunesModel documentation, training data summaries, copyright policy, downstream transparencyTreating “we just use an API” as escape. fine-tunes inherit duties
GPAI with systemic riskModels above 10^25 FLOPs training computeAdversarial testing, incident reporting, cybersecurity, energy reportingNo red-team program for frontier-tuned variants

Two cohorts of engineering teams routinely misclassify themselves. The first builds an “internal-only” hiring screener and assumes internal use exempts them. it does not; the function is what triggers high-risk. The second fine-tunes a foundation model for a niche domain and assumes the foundation provider’s compliance covers them. it does not; substantial modification inherits provider duties.

How FutureAGI provides the technical controls EU AI Act requires

FutureAGI provides the technical signals and controls the Act expects high-risk system providers to maintain. not legal compliance itself, which is your program’s responsibility. Three primitives anchor the integration.

First, evaluators that map to specific obligations. Public benchmarks give defensible accuracy and safety baselines for the technical file: HaluEval (35K Q&A; GPT-4 ~16.4% hallucination rate), TruthfulQA (817 questions; frontier 60-80%), and FutureAGI’s own PHARE safety suite all produce datapoints regulators can interrogate against your production scores. BiasDetection plus NoAgeBias, NoGenderBias, and NoRacialBias cover the data-governance and non-discrimination duties for hiring, credit, and similar systems (Articles 10, 15). DataPrivacyCompliance and PII cover cross-cutting privacy alignment (overlap with GDPR Article 22 and AI Act Article 10). IsCompliant lets you encode a custom policy rubric. “does this output meet our medical-device labeling rules?”. as a judge-model check that runs both offline on a golden dataset and online on every production trace. Toxicity and PromptInjection cover safety duties for limited-risk transparency systems. HallucinationScore and Faithfulness cover the accuracy duties under Article 15.

Second, runtime guardrails through Agent Command Center. Pre and post-guardrails enforce the policy at every model boundary; on Failed, the gateway blocks, redacts, or escalates. Each decision becomes an audit-log row with the request, the detector, the reason, the policy version, and the human-oversight handoff if triggered. the post-market monitoring evidence the Act requires under Articles 17-19. ProtectFlash provides a fast, low-latency content filter for the limited-risk transparency tier where the disclosure plus simple safety checks are the headline obligation.

Third, traceAI tracing captures every span across the agent trajectory. When a regulator asks “show the decision path that produced this output,” the trace is the answer: model calls, tool calls, retrieval results, guardrail decisions, MCP tool invocations, all in one OpenTelemetry-compatible record. Trace attributes pin gen_ai.request.model, policy_version, gen_ai.tool.name, and reviewer status so the audit reconstruction is one query, not a forensic exercise.

We’ve found that teams that stand this stack up before classification. rather than after. ship in the EU on the original timeline. FutureAGI is the technical control plane; your legal team owns the conformity assessment, the technical file, and the deployer notifications. Compared with WhyLabs and Credo AI, which focus on the documentation and assessment side, our integration ties every regulatory obligation to a runtime evaluator and a trace row that a regulator can inspect. That coupling. between policy and runtime. is what the Article 9 risk-management requirement actually demands.

Real-world example: a hiring assistant in three EU markets

A concrete walkthrough. A talent platform launches an LLM-powered hiring assistant for B2B customers in Germany, France, and Spain. The function. supporting hiring decisions. places it squarely in the high-risk tier under Annex III. Before launch the team owes: a documented risk-management process, training-data governance with bias coverage across age, gender, and national origin, technical documentation pinning model snapshot and prompt version, runtime logging that survives a six-month-plus retention requirement, an AI disclosure to candidates, a human reviewer in the loop for every adverse decision, and demonstrable accuracy on a representative test set.

The FutureAGI wiring covers each. A versioned Dataset with cohort columns for protected classes drives the offline BiasDetection and NoGenderBias eval. The same evaluators run on production samples through traceAI, with policy_version pinned per release. Every adverse score routes through a pre-guardrail that requires human reviewer sign-off; engagement rate, SLA, and reviewer agreement become dashboard metrics. IsCompliant carries the German, French, and Spanish labor-law rubrics as three separate custom evaluations. DataPrivacyCompliance ensures no candidate PII leaks across tenants. The technical file references a frozen gen_ai.request.model and dataset_version, and the post-market monitoring report is a query against the production trace store, not a hand-written PDF. When the platform later expands to Italy, the team adds an Italian-language cohort, re-runs the bias suite, and updates the technical file. none of the runtime code changes.

Mapping AI Act articles to FAGI surfaces

A concrete article-by-article mapping for high-risk systems:

ArticleObligationFutureAGI control
Article 9. Risk managementIdentify and mitigate foreseeable risks across lifecycleBiasDetection, IsCompliant, HallucinationScore on a regression eval suite
Article 10. Data governanceRepresentative, bias-checked, error-controlled training and test dataDataset with cohort metadata, BiasDetection by cohort, reviewer-tracked provenance
Article 11. Technical documentationSystem architecture, training data, performance, limitationsPinned dataset_version, gen_ai.request.model, evaluator scores in trace history
Article 12. Record-keepingAutomatic logging of events during system lifecycletraceAI spans, retained per the Act’s minimum periods
Article 13. TransparencyClear information to deployers and end usersAgent Command Center disclosure handlers, trace exports
Article 14. Human oversightEffective oversight by natural personshuman-in-the-loop escalation primitive, oversight-engagement metrics
Article 15. Accuracy, robustness, cybersecurityDemonstrate accuracy claims, resilience to errorsAnswerRelevancy, TaskCompletion, PromptInjection, adversarial regression suite
Article 17-19. Post-market monitoringContinuous monitoring of system performanceContinuous evaluation on production samples, drift alerts, incident logs
Article 50. Transparency for limited-riskAI disclosure, deepfake labelingProtectFlash plus disclosure UX templates
Article 55. GPAI systemic riskAdversarial testing, incident reportingRed-team eval suite plus incident-to-notification timer

How to detect EU AI Act gaps in your AI system

Compliance posture for an EU AI Act high-risk system is a set of operational metrics, not a single score. A useful 2026 detection stack covers six layers:

  • Per-decision evaluator coverage. fraction of in-scope production decisions scored by BiasDetection, DataPrivacyCompliance, IsCompliant, and the relevant domain check. Target 100% for high-risk paths.
  • Bias parity gaps. NoGenderBias, NoRacialBias, NoAgeBias failure-rate by cohort; the disparity itself is a regulatory signal, not just an internal quality metric. See the fairness entry for the statistical framing, plus the data drift entry for how cohort shift can re-introduce bias post-launch.
  • Audit-log retention. days of complete request/decision logs available; the Act expects at least six months for many high-risk classes, often longer.
  • Human-oversight engagement rate. fraction of escalated decisions reviewed within SLA by a qualified human. Article 14 wants oversight to be effective, not theatrical.
  • Incident-to-notification latency. time from detected serious incident to authority notification. The Act sets explicit thresholds (15 days for most, shorter for serious harm).
  • Documentation freshness. date of the last update to the technical file, the data-governance summary, and the conformity assessment. Stale documentation fails audit even if the runtime controls work.
  • GPAI-specific signals. for fine-tuners, evidence that the downstream system tested for the same systemic-risk categories the foundation provider tested for.
from fi.evals import IsCompliant, BiasDetection, DataPrivacyCompliance

policy = IsCompliant()
bias = BiasDetection()
privacy = DataPrivacyCompliance()

for trace in production_sample:
    trace.attach(policy.evaluate(output=trace.response))
    trace.attach(bias.evaluate(output=trace.response, cohort=trace.cohort))
    trace.attach(privacy.evaluate(output=trace.response))

The dashboard that matters is not the global pass rate; it is the per-cohort, per-article matrix. A 99% policy pass rate that hides a 60% pass rate on one protected class is an audit-failing pattern. Treat the per-cohort scores as the regulatory artifact, not the aggregate. A useful artifact to maintain alongside this matrix is a release-readiness checklist that ties every shipped change to the article it touches. if a release modifies the retriever, the data-governance rationale under Article 10 must be refreshed; if it modifies the system prompt, the technical file under Article 11 needs an update.

Common mistakes

  • Treating GDPR compliance as EU AI Act compliance. They overlap on transparency, but the AI Act adds duties on training-data governance, bias, robustness, and post-market monitoring that GDPR does not cover. Both apply simultaneously to most production LLM systems handling personal data.
  • Skipping classification for “internal” tools. A staff-facing hiring screener is still high-risk; the Act looks at function, not user count or whether the system is public.
  • No logging of context inputs. If your audit log captures the model output but not the retrieved context, tool calls, or prompt version, you cannot reconstruct a decision after the fact. Articles 12 and 17 require enough detail to support post-market monitoring.
  • Assuming “we’re not the provider” is a clean escape. Deployers carry distinct duties including risk-management for the use context, human oversight, and the user disclosure for limited-risk systems. Substantial fine-tuning of a foundation model can re-classify the deployer as a provider.
  • Treating bias evaluation as a one-time pre-launch check. Post-market monitoring (Article 17) requires continuous bias and quality evaluation against production traffic. A 12-month-old bias report is not evidence of current compliance.
  • No human-oversight workflow. Article 14 demands effective oversight, not a button. Track engagement rate, SLA, and reviewer disagreement to prove oversight is real.
  • Treating GPAI obligations as “the foundation provider’s problem.” Substantial modifications, fine-tunes, RAG-augmented systems, and agent stacks all inherit downstream GPAI duties. Pin the foundation-provider documentation in your technical file and add a delta describing your changes.
  • Ignoring the AI literacy requirement (Article 4). Every operator of an AI system in the EU must ensure relevant staff have sufficient AI literacy. This is a documentation obligation as of February 2025 and is the easiest article to fail without realizing.
  • Building a deepfake-adjacent feature without watermarking. Article 50 requires generated content to be machine-readable as AI-generated. Most home-grown image and audio pipelines do not meet this by default.

Frequently Asked Questions

What is the EU AI Act?

The EU AI Act is a 2024 European Union regulation that classifies AI systems into prohibited, high-risk, limited-risk, and minimal-risk tiers and imposes obligations on providers and deployers, including risk management, transparency, and human oversight.

How is the EU AI Act different from GDPR?

GDPR governs personal data processing; the EU AI Act governs AI systems regardless of whether they process personal data. They overlap on transparency and automated decision-making, but the AI Act adds duties around training data quality, bias, and post-market monitoring.

How do you operationalize EU AI Act compliance for LLMs?

Map your system to a risk tier, then wire bias, privacy, and content evaluators into the production path with audit-grade logging. FutureAGI's IsCompliant, BiasDetection, and DataPrivacyCompliance evaluators plus Agent Command Center audit logs cover the technical control surface.