How is IoU different from accuracy?

Accuracy is a classification-level metric. IoU is region-level — it cares not just whether the right object was detected but how well the predicted box or mask spatially aligns with the ground truth.

How does IoU fit into FutureAGI evaluation?

IoU is computed in your vision pipeline; FutureAGI consumes the per-row IoU as side-data and wraps it as a `CustomEvaluation`, so VLM and detection outputs can be regressed against ground truth in `Dataset.add_evaluation`.

What Is Intersection over Union (IoU)? FutureAGI Guide (2026)

Q: What is Intersection over Union (IoU)?

IoU is a similarity metric between two regions, computed as the area of their intersection divided by the area of their union. In vision it scores predicted bounding boxes or masks against ground truth.

What Is Intersection over Union (IoU)?

Intersection over Union (IoU), also known as the Jaccard index, is a similarity metric for two sets — the area of their intersection divided by the area of their union. In computer vision, IoU scores how well a predicted bounding box or segmentation mask overlaps the ground truth. It is the building block of Average Precision (AP) and mean Average Precision (mAP), the standard metrics for object detection. FutureAGI doesn’t compute IoU on raw pixels, but consumes IoU values as side-data on a Dataset and wraps them through CustomEvaluation so vision-language model outputs and detection results can be regressed against ground truth in the FutureAGI eval pipeline.

Why It Matters in Production LLM and Agent Systems

Detection failures don’t look like classification failures. A model can confidently report “found a stop sign” but draw the box around the side of a truck — IoU surfaces this where accuracy doesn’t. In production vision pipelines — VLMs answering “where in the image is X?”, agents calling perception tools, robotics workflows — the spatial correctness of the answer matters as much as the label. Without IoU-based evaluation, you ship spatially-wrong but semantically-confident outputs.

The pain spans roles. ML engineers see model upgrades that improve label accuracy but degrade IoU on small objects. Operations leads see downstream automation fail when “found object” really meant “found object 30 pixels off.” Compliance is asked, in a regulated vision use case, to show the model’s bounding boxes consistently align with ground truth — not just classify correctly.

In 2026-era multi-modal agent stacks, IoU resurfaces. A vision-language model that answers “click the blue button” must produce coordinates that overlap the actual button — IoU between predicted region and ground-truth element is the right metric. An MCP-connected vision tool returning bounding boxes for downstream agent steps fails downstream if IoU is low even when the label is right. Where humans-in-the-loop existed, agents now act on the box directly, raising the stakes on spatial accuracy.

How FutureAGI Handles Intersection over Union

FutureAGI is a model-agnostic eval and observability layer; we don’t run convolutional perception or compute pixel-level overlap. Where we plug in is after detection: you compute IoU per prediction in your vision pipeline (with torchvision, pycocotools, or a custom routine), store it as a column on the Dataset row, and wrap aggregations as CustomEvaluation. At the dataset level, every prediction row carries iou, predicted_box, and gt_box. At the evaluator level, a CustomEvaluation returns pass/fail given a per-class IoU threshold (often 0.5 for COCO-style mAP). At the dashboard level, eval-fail-rate-by-class slices IoU pass rate by object class, image cohort, or model variant. For VLMs, ImageInstructionAdherence evaluates whether the model followed an instruction that includes spatial reasoning — IoU is the side metric that tells you the spatial half is right.

Concretely: a logistics team uses a VLM to detect package labels in warehouse images. They run inference, compute IoU per detection in their pipeline, and write the per-row IoU into a FutureAGI Dataset. A CustomEvaluation returns 1.0 if IoU ≥ 0.5 and 0.0 otherwise. Dataset.add_evaluation runs it across 5,000 images, sliced by lighting condition. The dashboard shows mean IoU drops from 0.78 to 0.61 in low-light images, and the team prioritizes a low-light fine-tuning run. They write a regression eval that gates the next deploy on per-cohort mean IoU, not just label accuracy. IoU-driven regression evals make spatial quality a first-class production property.

How to Measure or Detect It

IoU itself is a math primitive; the signals that matter are how you wrap it:

Per-prediction IoU (your pipeline): area-of-intersection over area-of-union; the raw input to every higher-level metric.
CustomEvaluation for IoU threshold: returns pass/fail at a per-class threshold (commonly 0.5 or 0.75).
Mean IoU per class: averaged IoU across detections, sliced by class — surfaces small-object weakness.
mAP@[0.5:0.95]: aggregate AP across IoU thresholds; the COCO-standard headline metric.
ImageInstructionAdherence: pairs with IoU when the VLM is following spatial instructions.
Eval-fail-rate-by-image-cohort (dashboard signal): IoU pass rate sliced by lighting, scene type, or device.

Minimal Python:

from fi.evals import CustomEvaluation

def iou_pass(row, threshold=0.5):
    return 1.0 if row["iou"] >= threshold else 0.0

iou_eval = CustomEvaluation(name="iou-at-0.5", scorer=iou_pass)

Common Mistakes

Reporting only mean IoU. A 0.74 mean IoU can hide a 0.30 IoU on small objects; slice by class size and image cohort.
Using a single IoU threshold. Threshold 0.5 is forgiving; a production system targeting downstream automation usually needs 0.75 or higher.
Confusing IoU with confidence. Confidence is “how sure is the model”; IoU is “how spatially right is the box” — they correlate weakly.
Letting label accuracy mask spatial drift. Always pair label-classification metrics with IoU-based metrics; one without the other lies.
Skipping IoU on VLM spatial outputs. When a VLM emits coordinates for downstream agent action, IoU is a required regression metric — not optional.