Disparate Impact: Definition & FutureAGI Guide (2026)

What Is Disparate Impact?

Disparate impact is a fairness measure capturing whether an automated decision system produces materially worse outcomes for a protected group than for the reference group, even when the rule itself looks neutral. The most-cited operationalisation is the US EEOC’s four-fifths rule: the selection rate for a protected group should be at least 80% of the highest-rate group. Cousins include equalised odds, demographic parity, and counterfactual fairness. FutureAGI does not certify legal compliance; we measure the LLM-and-agent analogue with cohort-segmented bias evaluators.

Why It Matters in Production LLM and Agent Systems

LLMs and agents are now making decisions that historically came under disparate-impact law: resume screening, credit prequalification, content moderation, customer-support routing, fraud-flagging. A model that hallucinates more on minority-language queries, refuses more often for certain demographic cohorts, or routes lower-income users to lower-quality fallbacks is producing disparate impact whether or not the engineering team intended it.

The pain is shared across roles. Compliance leads see legal exposure under the EU AI Act, US state-level employment laws, NYC Local Law 144 on automated employment decision tools, and CFPB guidance for credit decisioning. Product managers see CSAT and conversion drop in specific cohorts with no obvious model-quality cause. ML engineers find that aggregate evaluator scores look fine while cohort-segmented scores diverge. SREs see traffic patterns that vary by demographic without a clear root cause and cannot tell whether the cause is the model, the retriever, or the routing layer above them.

In 2026-era agent stacks the problem compounds. A retriever that under-indexes minority-dialect content, a planner that biases tool selection by demographic-correlated features, and an LLM that inherits training-data bias all stack into a single disparate-impact regression. Each layer alone may pass an isolated bias test while the composed pipeline fails. Cohort-segmented evaluation tied to OpenTelemetry spans is the only way to localise where in the trajectory the disparity originates.

How FutureAGI Handles Disparate Impact

FutureAGI’s surface for disparate impact is cohort-segmented evaluation. BiasDetection returns a score and reason on whether an output reflects bias against a protected group. NoGenderBias, NoRacialBias, NoAgeBias, and Sexist are targeted evaluators for specific protected attributes. CulturalSensitivity covers cultural cohort fairness. Each evaluator runs against a Dataset whose rows carry cohort metadata; aggregated per cohort, the outputs produce the rate-comparison numbers that disparate-impact analysis needs.

The practical workflow: a hiring-screen LLM team imports candidate transcripts into a Dataset with anonymised gender and age cohort columns, runs BiasDetection and NoGenderBias per cohort, and computes the four-fifths-rule statistic on selection rates. Results are stored as evaluation artifacts so re-runs against the next model checkpoint produce a regression-eval signal. In production, the Agent Command Center logs cohort metadata into traces; eval-fail-rate-by-cohort segments fairness regressions in real time, and a post-guardrail blocks responses that fail BiasDetection on cohort-tagged routes. FutureAGI’s approach is to treat disparate impact as a release-blocking regression, not a once-a-year audit finding. Unlike Aequitas’s static fairness audits, FutureAGI keeps fairness checks tied to continuous evals and production traces.

How to Measure or Detect It

Useful FutureAGI signals for disparate-impact analysis:

BiasDetection — primary fairness evaluator, returns score and reason.
NoGenderBias, NoAgeBias, NoRacialBias, Sexist — targeted attribute-specific evaluators.
CulturalSensitivity — cultural-cohort fairness check.
Cohort-segmented eval-fail-rate-by-cohort for selection-rate computation.
Four-fifths-rule statistic across protected vs reference cohort selection rates.
Audit-log dashboard — every fairness-evaluator decision archived for compliance.

Minimal Python:

from fi.evals import BiasDetection, NoGenderBias

bias = BiasDetection()
gender = NoGenderBias()

result = bias.evaluate(
    input=user_query,
    output=model_response,
    context=cohort_metadata,
)

Common Mistakes

Aggregate-only fairness scores. A global bias score can pass while one protected cohort fails; report BiasDetection and selection rates by cohort before release.
Skipping intersection cohorts. Gender, age, language, and geography can stack; evaluate intersections when sample size supports a defensible confidence interval.
Conflating disparate impact with disparate treatment. Disparate impact measures outcome gaps; disparate treatment concerns intentional differential handling. The audit should separate both claims.
One-shot fairness audits. Disparate impact shifts as prompts, retrievers, traffic, and model checkpoints change; rerun fairness evals on rolling cohorts.
No paper trail for thresholds. Regulators and customers ask why 80%, 90%, or another threshold was chosen; archive evaluator reasons per release.

Frequently Asked Questions

What is disparate impact?

Disparate impact is a fairness measure that captures whether an automated decision produces materially worse outcomes for a protected group than for the reference group. The classical legal proxy is the four-fifths rule.

How is disparate impact different from disparate treatment?

Disparate treatment is intentionally different handling based on a protected attribute. Disparate impact is unintentional: the rule is facially neutral but its effects fall harder on a protected group. Both are recognised in US discrimination law.

How do you measure disparate impact in LLM systems?

Cohort-segment your dataset by protected attribute, run `BiasDetection`, `NoGenderBias`, `NoRacialBias`, and `NoAgeBias` per cohort, and compare outcome rates against the four-fifths rule.