How is impact assessment different from risk assessment?

Risk assessment focuses on the probability and severity of negative outcomes. Impact assessment is broader — it covers who is affected, what rights are touched, what mitigations exist, and how the system will be monitored. Risk is often a section inside an impact assessment.

How does FutureAGI support impact assessments?

FutureAGI doesn't author the assessment, but it provides audit-grade traces, evaluator scores like TaskCompletion, and human-feedback fields that supply the runtime evidence backing each mitigation and monitoring claim.

Impact Assessment for AI: FutureAGI Guide (2026)

What Is Impact Assessment for AI?

An AI impact assessment is a structured evaluation of the people, decisions, rights, and downstream effects an AI system touches across its lifecycle. It identifies affected populations (users, bystanders, operators, regulators), foreseeable harms (disparate outcomes, privacy loss, safety incidents, economic effects), mitigation controls (guardrails, human oversight, refusal flows, escalation), and ongoing monitoring requirements. Common frameworks include the EU AI Act’s Fundamental Rights Impact Assessment for high-risk systems, NIST AI RMF, ISO 42001, Canada’s Algorithmic Impact Assessment, and a growing set of sector-specific templates. The output is a document — and a set of runtime obligations.

Why It Matters in Production LLM and Agent Systems

A good impact assessment turns vague concerns about “AI risk” into specific obligations: which populations get monitored, which decisions trigger human review, which evaluators must run, what the escalation path looks like when a metric breaches threshold. A bad impact assessment is a PDF that nobody reads. The difference between the two is whether the obligations show up in the runtime stack.

The pain shows up unevenly. A compliance lead writes a fundamental-rights impact assessment claiming the model has “ongoing performance monitoring” and discovers, three months in, that the engineering team has no dashboard for the cohorts the assessment named. A product lead promises in a review that “users can contest decisions” without realizing the runtime has no record of why each decision was made. A legal team faces a regulator’s data request and cannot produce per-decision audit evidence because the trace logs were never tied to the decisions the assessment scoped.

In 2026 the regulatory cost of getting this wrong is no longer hypothetical. The EU AI Act’s high-risk obligations require demonstrable monitoring, not just policy claims. Useful symptoms in production: missing trace evidence for assessment-scoped decisions, drift on monitored cohorts that the assessment named, escalation queues that don’t tie to the assessment’s defined human-oversight role, and audit responses that take days to produce because the data is not queryable.

How FutureAGI Handles Impact Assessment for AI

FutureAGI does not author impact assessments — that’s a governance, legal, and product function. What FutureAGI provides is the runtime evidence layer that makes the assessment’s obligations enforceable. Every production trace ingested via traceAI integrations such as traceAI-langchain, traceAI-langgraph, or traceAI-openai-agents carries the full trajectory: input, retrieved sources, planner reasoning, tool calls, model used, and evaluator scores like TaskCompletion, PromptInjection, and Faithfulness. Reviewer actions — approve, deny, override, escalate — are recorded as span_event records tied to trace id and reviewer identity, exactly the audit evidence an impact assessment promises.

Concretely: a credit-decisioning AI’s impact assessment defines four affected cohorts (low-credit-history applicants, rural applicants, immigrant applicants, applicants under 25), commits to per-cohort fairness monitoring, and requires human review on adverse decisions. The engineering team configures FutureAGI to compute eval-fail-rate-by-cohort per assessment cohort, sets alerts on disparate-impact metrics, and routes adverse decisions through an annotation queue that records reviewer rationale. When an auditor asks for evidence that fairness monitoring is active for the rural cohort, the team queries the dashboard and exports the last 90 days of cohort-sliced scores plus reviewer decisions. The assessment is no longer a PDF — it is a queryable runtime contract.

Compared with a governance-only NIST AI RMF implementation that records intended controls, the runtime question is whether each control has trace evidence. FutureAGI’s approach is to pair policy language with evaluator scores, traces, and reviewer actions so each impact-assessment claim has runtime evidence behind it.

How to Measure or Detect It

Impact-assessment compliance is measured by evidence completeness and obligation coverage:

TaskCompletion — pairs with assessment-scoped decisions to demonstrate ongoing performance monitoring per the assessment’s commitments.
PromptInjection — covers safety obligations defined in the impact assessment for adversarial-input handling.
Per-cohort eval-fail-rate (dashboard signal) — the canonical evidence for fairness and disparate-impact obligations defined in the assessment.
Audit-log completeness — fraction of assessment-scoped decisions with full trace, reviewer identity, action, and rationale.
Mean-time-to-evidence (operational metric) — how fast the team can produce documentation for a regulator’s request; long times signal a governance/observability gap.

from fi.evals import TaskCompletion, PromptInjection

task = TaskCompletion()
inj = PromptInjection()

# evaluate assessment-scoped cohort
for cohort_id, cohort in cohorts.items():
    task_scores = [task.evaluate(input=t.input, trajectory=t.spans).score for t in cohort]
    inj_scores = [inj.evaluate(input=t.input).score for t in cohort]
    print(cohort_id, sum(task_scores)/len(task_scores), sum(inj_scores)/len(inj_scores))

Common mistakes

Treating the assessment as a launch document. Without ongoing monitoring tied to the assessment’s claims, the document goes stale within a quarter.
Naming cohorts without instrumenting them. A cohort the assessment commits to monitoring needs trace-level slicing in production.
No reviewer rationale capture. A bare “approved” record cannot demonstrate the human-oversight commitment the assessment makes.
Confusing impact assessment with risk assessment. Risk is a sub-component; the impact assessment also covers rights, mitigations, monitoring, and contestability.
Skipping post-incident updates. An incident that doesn’t update the impact assessment leaves stale obligations and missing controls.