How is field completeness different from schema compliance?

Schema compliance checks the broader contract: syntax, fields, types, and constraints. Field completeness focuses on whether expected fields are present, even before judging value correctness or type compliance.

How do you measure field completeness?

Use FutureAGI's `fi.evals.FieldCompleteness` when you have a schema and `FieldCoverage` when you compare the response to an expected output object. Track missing required fields by dataset, prompt version, and trace cohort.

What Is Field Completeness? FutureAGI Guide (2026)

Q: What is field completeness?

Field completeness checks whether structured LLM output includes every required or expected field. FutureAGI measures it with `FieldCompleteness` and `FieldCoverage` for JSON, extraction, and tool-output workflows.

What Is Field Completeness?

Field completeness is an LLM-evaluation metric for structured outputs that checks whether every expected field is present, populated, and recoverable for downstream code. It shows up in eval pipelines, production traces, extraction jobs, and tool-call arguments when a model returns valid-looking JSON but omits customer_id, deadline, or another required key. FutureAGI anchors this metric to FieldCompleteness and FieldCoverage, so teams can separate missing fields from type, schema, and factual errors.

Why Field Completeness Matters in Production LLM and Agent Systems

Field completeness failures are silent contract failures. A sales-enrichment agent returns a company profile but drops source_url. A claims assistant emits a valid JSON object but omits appeal_deadline. A planner calls the right shipping tool but leaves delivery_method empty, so the next service chooses a default the user never approved. The named failure mode is structured omission: the output parses, yet the record is not complete enough to trust.

Developers feel it as flaky downstream behavior rather than a clear model error. SREs see retries, 400 or 422 responses, dead-letter records, and p99 latency spikes after repair loops. Compliance teams see audit rows without required consent flags, policy citations, or reason codes. Product teams see users re-entering information the agent already asked for, because the final structured state lost it.

The risk is higher in 2026-era agent pipelines because fields often move across several boundaries: retrieval, extraction, tool call, workflow state, final answer, and storage. Each step can look healthy in isolation while the final object is missing one field that controls billing, eligibility, routing, or safety review. Unlike Ragas-style faithfulness checks, which ask whether an answer is supported by context, field completeness asks whether the expected output shape is materially filled.

How FutureAGI Handles Field Completeness

FutureAGI’s approach is to treat field completeness as a structured-output eval, not a formatting preference. The eval:FieldCompleteness anchor maps to the FieldCompleteness local metric, which measures required field presence, optional field presence, and nested field coverage in structured output. The eval:FieldCoverage anchor maps to FieldCoverage, which compares the response’s fields against an expected output object and reports which expected field paths were covered or missed.

Consider a customer-support agent that must return case_id, refund_eligible, deadline, evidence_required, and next_action. The team instruments the LangChain workflow with traceAI-langchain, stores the generated JSON on the final span, and keeps the expected object in a FutureAGI dataset. In CI, Dataset.add_evaluation runs FieldCompleteness against the schema and FieldCoverage against golden examples. In production, agent.trajectory.step traces show whether the retrieval tool found the deadline before the final JSON dropped it.

In our 2026 evals, the highest-signal view is field failure by cohort, not only the average score. If missing deadline rises after a prompt update, the engineer inspects failed traces, adds an explicit synthesis instruction, and blocks release until the regression eval passes. If the field is present but wrong, the owner routes the case to GroundTruthMatch, FunctionCallAccuracy, or domain-specific review instead of blaming field completeness for value quality.

How to Measure or Detect Field Completeness

Measure field completeness at the structured boundary where an LLM output becomes application state:

FieldCompleteness — returns a 0-1 score, reason, required-field counts, optional-field counts, and missing field paths from a schema-backed structured response.
FieldCoverage — returns a 0-1 score for expected field paths present in the response, plus covered, missing, and extra field lists.
Trace evidence — compare the final structured output with earlier agent.trajectory.step spans to find fields known by the agent but lost during synthesis.
Dashboard signals — track missing-required-field rate, eval-fail-rate-by-cohort, repair-loop count, and schema-driven retry count.
User-feedback proxy — repeated form submissions, escalations, and “you already asked me that” complaints often rise when state fields disappear.

Minimal Python:

import json
from fi.evals import FieldCompleteness, FieldCoverage

schema = {"type": "object", "required": ["case_id", "deadline"], "properties": {"case_id": {"type": "string"}, "deadline": {"type": "string"}}}
response = json.dumps({"case_id": "C-123"})
required = FieldCompleteness().evaluate([{"response": response, "format": "json", "schema": schema}])
coverage = FieldCoverage().evaluate([{"response": response, "expected": {"case_id": "", "deadline": ""}}])
print(required.eval_results[0].output, coverage.eval_results[0].output)

Common Mistakes

Most errors come from confusing a parseable object with a usable record.

Stopping at JSON validity. Valid JSON can still omit deadline, policy_id, consent_status, or another field the workflow requires.
Treating null as present. Decide whether null, empty strings, and empty arrays count as complete for each field.
Mixing completeness with correctness. A field can exist and still contain the wrong value; score that with ground-truth or function-call metrics.
Ignoring nested fields. Top-level coverage hides missing customer.address.postal_code or line_items[0].sku failures.
Using one threshold for every field. Missing middle_name and missing payment_authorization should not carry the same release risk.