What Is a False Negative?
A classifier prediction that says "no" when the correct answer is "yes" — a missed positive — sitting in the bottom-left of the confusion matrix.
What Is a False Negative?
A false negative is a classifier prediction that says “no” when the correct answer is “yes” — a missed positive. In a binary confusion matrix it sits in the bottom-left cell; it is the complement of recall (recall = 1 − false-negative rate). False negatives matter most when the cost of missing a positive is asymmetric and high: a missed fraud, a missed cancer, a missed prompt injection. In LLM evaluation, a false negative happens when an evaluator misses a real hallucination, jailbreak, or PII leak. Threshold tuning trades them against false positives, and the right ratio depends on cost.
Why It Matters in Production LLM and Agent Systems
A false negative is silent. Unlike a false positive, which usually annoys someone immediately (“why did the safety filter block my legitimate prompt?”), a false negative escapes notice until the harm shows up downstream. A jailbreak that the prompt-injection detector missed becomes a leaked system prompt. A hallucinated medical claim that the faithfulness evaluator missed becomes a customer harm. A PII leak that PII did not catch becomes a regulator letter.
The pain falls on different roles. SREs cannot alert on something the detector did not flag. Compliance teams discover the missed cases at audit, often months later. Product teams field complaints whose root cause traces back to a detector with weak recall. ML engineers tune thresholds to please precision-focused stakeholders, and the false-negative rate balloons quietly.
In 2026-era stacks the asymmetry sharpens. An agent making 10 tool calls per session amplifies a single missed bad call into a multi-step harm. A safety detector with 95% recall misses 5% of attacks; if attackers retry, effective coverage collapses. Multi-judge ensembles, threshold tuning per cohort, and RegressionEval against an adversarial golden dataset are the standard 2026 mitigations. FutureAGI tracks false-negative rate per evaluator and alerts when recall drifts below the threshold the team committed to in their risk register.
How FutureAGI Handles False Negatives
FutureAGI’s approach is to track recall and false-negative rate as first-class metrics on every safety-relevant evaluator, not just global accuracy. The platform maintains adversarial golden datasets — labelled positives that should always be flagged — and runs RegressionEval on them every release. Each evaluator returns a per-row score plus a reason; the platform aggregates this into per-evaluator FNR with a threshold the team sets in their risk register.
A concrete workflow: a security team runs PromptInjection and ProtectFlash on user prompts. Their golden dataset has 500 known-bad prompts plus 5000 known-benign prompts. The eval pipeline reports FNR (missed bad prompts) and FPR (flagged benign prompts) separately. After a model upgrade, FNR jumps from 2% to 7% — a regression. The team rolls back, tunes the threshold, or stacks an additional HallucinationScore evaluator in the post-guardrail chain. The Agent Command Center supports model-fallback routes so traffic can be routed to the more conservative model variant on prompts that score near the decision boundary. Unlike a static F1 number, FutureAGI separates precision and recall so the team can choose where to trade — false positives annoy users, false negatives create incidents.
How to Measure or Detect It
False negatives are measured through a labelled-truth signal stack:
- False-negative rate (FNR):
FN / (FN + TP); reported as a percentage of true positives missed. - Recall:
TP / (TP + FN); the complement of FNR. HallucinationScoreover an adversarial golden set: missed hallucinations are FNs.PromptInjectionover a labelled attack set: missed attacks are FNs.- Confusion matrix per cohort: surfaces FNR concentrations on specific input slices.
- ROC and PR curves: visualise the FNR/FPR tradeoff at every threshold.
RegressionEvaldeltas across releases: a recall drop is a red flag.
from fi.evals import PromptInjection
detector = PromptInjection()
labels = [(prompt, is_attack) for prompt, is_attack in golden_set]
fn = sum(1 for p, t in labels if t and detector.evaluate(input=p).score < 0.5)
print("False negatives:", fn)
Common Mistakes
- Optimising for accuracy when recall matters. A 99% accurate detector that misses every attack on the 1% positive rate is useless.
- Single-threshold reporting. A threshold that works on average can have catastrophic FNR on specific cohorts.
- No adversarial dataset. Without known positives in the eval set, FNR cannot be computed.
- Confusing FNR with the inverse of FPR. They are independent dimensions; both matter.
- Ignoring drift. A stable FNR at launch can degrade silently as inputs evolve.
Frequently Asked Questions
What is a false negative?
A false negative is a classifier prediction of "no" when the truth is "yes". It is a missed positive and reduces recall. False negatives are typically the costlier error in safety-critical detection: a missed jailbreak, a missed tumour.
What is the difference between a false negative and a false positive?
A false negative misses a real positive (says no when it should say yes). A false positive flags a real negative (says yes when it should say no). Lowering one usually raises the other; tuning the threshold trades them.
How do you measure false negatives in LLM evaluators?
Run the evaluator on a labelled golden set with both positive and negative cases, then compute false-negative rate per evaluator. FutureAGI's `RegressionEval` over a `Dataset` produces FNR per evaluator across releases.