False Positive Parity: Definition & FutureAGI Guide (2026)

What Is False Positive Parity?

False positive parity is a fairness criterion that requires a classifier’s false-positive rate to be equal, within tolerance, across protected groups. In LLM evaluation pipelines, it checks whether safety filters, toxicity detectors, or abuse classifiers over-block one cohort more than another. Even when global FPR looks acceptable, unequal cohort FPRs translate to unequal user-experienced harm. Teams enforce false positive parity through per-cohort threshold tuning, group-aware loss functions, or post-hoc calibration. FutureAGI treats it as a live cohort metric because parity definitions can conflict; the right one depends on the harm model.

Why false positive parity matters in production LLM and agent systems

A safety stack with a 2% global FPR can hide a 6% FPR for one cohort and a 0.5% FPR for another — and the cohort getting wrongly flagged six times more often experiences a different product. The friction is invisible in aggregate metrics and shows up as cohort-specific churn, complaints, or regulatory inquiry.

The pain falls on different roles. Product leads see drop-off in a specific user segment without knowing why. Compliance teams reading EU AI Act risk-management or NYC Local Law 144 obligations need cohort-level evidence that the model treats groups consistently. Customer-success teams field disproportionate over-blocking complaints from a cohort whose prompts trigger the safety stack more often. ML engineers find themselves debating which fairness criterion to honour because the literature reports they are mathematically incompatible.

In 2026-era LLM stacks, false positive parity matters most for moderation, prompt-injection detection, and PII redaction — the high-FPR-cost surfaces. A Toxicity evaluator with global 1.5% FPR can have a 4% FPR on prompts containing names common in one demographic. The fix is per-cohort threshold tuning plus exemption rules in pre/post guardrails. FutureAGI’s per-cohort FPR dashboard makes the divergence visible before the user complaints arrive — and the platform supports cohort-aware policies so the team can equalize the user-experienced rate.

How FutureAGI handles false positive parity

FutureAGI’s approach is to compute false-positive rate per cohort on every safety evaluator, surface divergence on the dashboard, and let engineers either tune thresholds per cohort or layer additional evaluators to equalize the user-experienced rate. Each evaluator output carries a score, a reason, and a metadata path for cohort segmentation. The platform stores both the prediction and the ground-truth label so confusion-matrix cells are computable per cohort.

A concrete workflow: a moderation team running Toxicity reviews per-cohort FPR weekly. They find FPR for one cohort is 3.2% versus a global 1.4%. They open the failure cohort, sort by reason, and find the model is over-flagging vernacular phrases. The team has two options: tune the cohort threshold from 0.5 to 0.62 (raising recall risk on that cohort), or add a pre/post guardrails rule for the specific phrases. They run a second Dataset.add_evaluation on the canonical fi.datasets.Dataset to confirm the change does not damage recall, then ship. The Agent Command Center can mirror the affected slice with traffic mirroring before routing users through the cohort-tuned policy. Unlike Aequitas, which is an audit-time toolkit, FutureAGI runs the parity check continuously on production samples and alerts when cohort FPR drifts. We’ve found that fairness criteria need to be live signals — a one-time audit cannot survive model upgrades, prompt changes, or input distribution shifts.

How to measure or detect false positive parity

False positive parity is measured by computing FPR per cohort and comparing:

Per-cohort FPR: FP_cohort / (FP_cohort + TN_cohort); the headline parity metric.
Parity ratio: max(cohort_FPR) / min(cohort_FPR); thresholds vary, 0.8–1.25 is common.
BiasDetection: returns a 0–1 bias score with reason; group by cohort metadata for parity comparison.
NoGenderBias / NoRacialBias: targeted boolean evaluators for specific protected attributes.
Confusion matrix per cohort: surfaces all four cells — TN, FP, FN, TP — independently per group.
eval-fail-rate-by-cohort: dashboard segmentation flags cohorts where FPR diverges.
Dataset.add_evaluation deltas: per-cohort FPR drift between releases is a parity regression.

from fi.evals import Toxicity

detector = Toxicity()
for cohort, examples in cohorts.items():
    fp = sum(1 for ex in examples if not ex.label and detector.evaluate(output=ex.text).score > 0.5)
    tn = sum(1 for ex in examples if not ex.label and detector.evaluate(output=ex.text).score <= 0.5)
    print(cohort, "FPR:", fp / (fp + tn))

Common mistakes

Treating global FPR as sufficient. A balanced average can hide a 5x cohort-level disparity.
Picking one fairness criterion without naming the harm. The criteria are mathematically incompatible; pick the one that matches the cost.
Tuning thresholds on a stale dataset. Cohort distributions drift; re-tune on recent samples.
Confusing FPR parity with demographic parity. They measure different cells of the confusion matrix.
Not logging the cohort metadata. Without it, parity cannot be computed at all.

Frequently Asked Questions

What is false positive parity?

False positive parity is a fairness criterion that requires equal false-positive rates across protected groups. It is satisfied when a classifier's rate of wrongly flagging negatives is consistent across cohorts.

How does false positive parity differ from demographic parity?

Demographic parity requires equal positive-prediction rates across groups regardless of truth. False positive parity requires equal rates of wrongly flagged negatives. They can conflict — satisfying one can violate the other on the same model.

How do you enforce false positive parity in production?

FutureAGI computes per-cohort FPR via `BiasDetection` and confusion-matrix segmentation. Engineers either tune the decision threshold per cohort or add a rule in `pre/post guardrails` to equalize the user-facing rate.