What Is a Canonical Schema?
A single shared data structure that producers and consumers in a system map to and from, used as the contract for structured output.
What Is a Canonical Schema?
A canonical schema is a single, versioned data structure that every producer and consumer in an LLM or agent system maps to. It is the contract between a model’s structured output, tools, and downstream storage, specifying field names, types, optionality, enum values, and validation rules in JSON Schema, Pydantic, or Protobuf. In FutureAGI, canonical schemas become eval and trace targets: model responses can be checked automatically, schema drift can be detected, and breaking changes can be versioned before they reach production.
Why Canonical Schemas Matter in Production LLM and Agent Systems
LLM outputs are non-deterministic. The same prompt can return {"amount": 42} on one call and {"amount": "$42.00", "note": "..."} on the next. Without a canonical schema and an enforced validation step, downstream consumers either parse defensively (hiding errors) or crash. Either way, you lose signal.
The pain shows up across roles. A backend engineer writes code that expects order_id: string and the model returns it as a number 4% of the time — production errors, but only on a long-tail cohort. A product manager sees an agent that “works in demos” but fails in integration tests because two teams have slightly different ideas of what the schema should be. A compliance lead is asked which fields contain PII and cannot answer because each tool has its own ad-hoc shape.
In 2026 multi-agent stacks, the problem multiplies. A planner agent emits a sub-task spec; a tool agent consumes it; a critic agent reads both. If those three agents share a canonical schema, the system can be tested end-to-end. If they don’t, every change to one agent’s prompt is a potential break in the others. Function calling and MCP have made this worse-or-better depending on whether you treat the tool spec as canonical or as a suggestion.
How FutureAGI Handles Canonical Schemas
FutureAGI does not generate canonical schemas — that is a design decision your team makes — but it validates outputs against them and surfaces violations as eval signals.
FutureAGI’s approach is to treat schema validation as a trace-level reliability signal, not a late parser exception.
Concretely: a team defines an OrderEvent JSON Schema as the canonical contract between an LLM agent and the order-management service. They configure fi.evals.JSONValidation with that schema as the validator and run it on every production trace where llm.output_messages is structured output. The evaluator returns a boolean per call plus a list of violations (missing required field, type mismatch, regex failure). Those violations land as span_event entries on the trace, so a single trace view shows the model’s raw output, the parsed structure, the schema diff, and the downstream tool’s response.
For finer-grained signals, SchemaCompliance returns 0–1 for structural correctness, TypeCompliance isolates type-level issues, and FieldCompleteness returns the percentage of expected fields actually present. The team builds a dashboard showing schema-violation-rate per agent version, per intent, per model. When the rate spikes after a system-prompt change, regression-eval against a versioned Dataset confirms whether the prompt or the model is to blame. Compared with Pydantic-only validation inside application code, this shows which field broke, how often, and in which production trace.
How to Measure or Detect Canonical Schema Failures
Schema health combines correctness, completeness, and trend:
fi.evals.JSONValidation: returns boolean + violation list against a JSON Schema; the headline gate.fi.evals.SchemaCompliance: returns 0–1 structural compliance score with partial credit; useful for ranking variants.fi.evals.TypeCompliance: returns type-level conformance, ignoring extra fields and constraints.fi.evals.FieldCompleteness: returns the percentage of expected fields present in the output.- Schema-violation-rate-by-cohort: dashboard signal sliced by model, prompt version, or user segment; the regression alarm.
- Downstream parse-error rate: in your application logs; should approach zero if schema gates are enforced upstream.
from fi.evals import JSONValidation
schema = {
"type": "object",
"required": ["order_id", "amount"],
"properties": {
"order_id": {"type": "string"},
"amount": {"type": "number"}
}
}
val = JSONValidation(schema=schema)
result = val.evaluate(output='{"order_id": "A-1", "amount": 42.0}')
print(result.score, result.reason)
Common mistakes
- Letting each consumer define its own shape. Two consumers and two schemas equals one production incident waiting for the next prompt change.
- Skipping enum and regex constraints. Loose schemas pass
JSONValidationbut admit garbage; tighten them. - Versioning the schema in code only. A canonical schema needs a registry — even a Git-tracked JSON file with a version field beats living inside one tool’s source.
- Ignoring schema violations as “long-tail noise”. A 0.5% violation rate that is concentrated in one model variant is a regression, not noise.
- Relying on the prompt to enforce the schema. Use a structured-output API or a validation gate; prompts alone drift.
Frequently Asked Questions
What is a canonical schema?
A canonical schema is a single agreed-upon data structure that all components in a system — including LLM structured outputs, tools, and storage — map to and from, defining field names, types, optionality, and validation rules.
How is a canonical schema different from a JSON Schema?
JSON Schema is the language used to describe the contract. A canonical schema is the chosen, versioned instance — one specific JSON Schema (or Pydantic, Protobuf, etc.) that everyone in the system agrees to use as the source of truth.
How does FutureAGI validate against a canonical schema?
FutureAGI's JSONValidation evaluator validates LLM output against your JSON Schema, SchemaCompliance scores structural compliance, and TypeCompliance focuses on type-level correctness — all wired to traces.