What Is Schema Compliance?
An LLM-evaluation metric that scores whether structured model output matches required fields, types, nested shape, and value constraints.
What Is Schema Compliance?
Schema compliance is an LLM-evaluation metric that checks whether a model or agent output follows a required structured-output contract. In production eval pipelines, it covers parseable JSON or YAML, required fields, data types, nested structure, and value constraints such as enums or ranges. It shows up anywhere an LLM response becomes code input: tool arguments, extraction jobs, routers, and workflow state. FutureAGI measures it with SchemaCompliance and pairs it with JSONValidation when the contract is JSON Schema.
Why Schema Compliance Matters in Production LLM and Agent Systems
Schema compliance failures are boundary failures between language and code. A support agent emits {"refund_amount":"29.00"} instead of a number, an intake flow drops the required customer_id, or a planner writes "ship" for an enum that only accepts "standard" or "express". The answer may look fine to a human. The next service rejects it, coerces it, or silently stores bad data.
Developers feel it as parser exceptions, Pydantic or Zod errors, downstream 400 and 422 responses, and dead-letter queues filling with traces that look valid at first glance. SREs see retry-rate spikes and longer p99 latency because the same malformed structure is re-generated. Product owners see task-completion regressions that do not look like hallucinations. Compliance teams lose auditability when required fields such as consent flags, policy reasons, or data-retention categories are absent.
It matters even more in multi-step agent pipelines. One bad structured object can poison every later step: a planner chooses the right tool but emits the wrong argument type; a retriever returns metadata in the wrong shape; a router stores invalid state and sends the next model call down the wrong path. Schema compliance turns that failure into a measurable eval signal instead of an application crash.
How FutureAGI Handles Schema Compliance
FutureAGI’s approach is to treat schema compliance as a contract test that can run both offline and at runtime. The eval:SchemaCompliance surface maps to the SchemaCompliance class in fi.evals, which evaluates structured output compliance with a schema across syntax, field presence, type correctness, value constraints, and structural shape. The eval:JSONValidation surface maps to JSONValidation, which checks JSON syntax, JSON Schema compliance, and optional expected-value matching. Both are local metrics in the FutureAGI inventory.
Concrete workflow: an invoice-extraction team defines a JSON Schema for invoice_id, total, currency, due_date, and line_items. In CI, they run JSONValidation on a frozen golden dataset to catch unparseable JSON and strict JSON Schema failures. In regression evals, they run SchemaCompliance with format: "json" to get a partial 0-1 score when the model is close but misses a field or uses the wrong type. A drop below 0.98 blocks the prompt or model release.
In production, the same contract can sit behind an Agent Command Center post-guardrail: if a response fails validation, the gateway can retry with the validation error, send the trace to review, or route to model-fallback. Unlike a plain Pydantic or Zod exception, the eval result becomes a metric: schema-pass-rate by field, model, prompt version, and customer cohort.
How to Measure or Detect Schema Compliance
Use schema compliance as a scored gate, not only a parser check:
fi.evals.SchemaCompliance— returns a 0-1 structured-output score; full compliance returns1.0, partial compliance includes syntax, field, type, and constraint breakdowns.fi.evals.JSONValidation— JSON-specific evaluator for syntax, JSON Schema conformance, and optional expected-value comparison.- Schema-pass-rate by field — dashboard the percent of traces where each required field is present and correctly typed.
- Retry and fallback counts — schema-driven retry spikes usually mean a prompt, model, or schema version changed.
- Downstream rejection rate — group
400,422, and validator exceptions by schema version to catch contract drift.
Minimal Python:
from fi.evals import SchemaCompliance
metric = SchemaCompliance()
result = metric.evaluate([{
"response": '{"id": "123", "total": 29.5}',
"format": "json",
"schema": order_schema,
}])
print(result.eval_results[0].output)
Common Mistakes
Most schema bugs come from treating the schema as documentation instead of an executable contract.
- Treating parseability as compliance. Valid JSON can still fail required fields, types, enum constraints, or
additionalProperties: false. - Using permissive schemas during eval. If production rejects extra fields, the eval schema must reject them too.
- Averaging all fields into one pass rate. Track field-level failures; one enum regression can hide under a healthy global score.
- Retrying without the validation error. The model needs the specific missing field or type mismatch to repair output.
- Skipping checks for native function calling. Provider-enforced structure still misses business constraints, formats, and downstream type rules.
Frequently Asked Questions
What is schema compliance?
Schema compliance checks whether LLM structured output follows a required contract: valid syntax, required fields, correct types, nested structure, and value constraints. FutureAGI measures it with `SchemaCompliance` and pairs it with `JSONValidation` for JSON Schema workflows.
How is schema compliance different from JSON validation?
JSON validation is JSON-specific: it checks parseability and JSON Schema conformance. Schema compliance is broader structured-output scoring, including JSON or YAML formats and partial compliance across syntax, fields, types, and constraints.
How do you measure schema compliance?
Use FutureAGI's `fi.evals.SchemaCompliance` for structured-output scoring and `fi.evals.JSONValidation` when the contract is JSON Schema. Track evaluator output, schema-pass rate, and field-level failures.