How is type compliance different from schema compliance?

Type compliance checks only data types such as string, number, boolean, array, object, or null. Schema compliance also checks syntax, required fields, nested shape, extra fields, and value constraints.

How do you measure type compliance?

Use FutureAGI's `fi.evals.TypeCompliance` on structured outputs and track type-fail-rate by field, model, prompt version, and route. Pair it with `SchemaCompliance` when the release gate also needs required-field and constraint checks.

What Is Type Compliance? FutureAGI Guide (2026)

Q: What is type compliance?

Type compliance checks whether structured LLM or agent output uses the expected data types for each field. FutureAGI measures it with `fi.evals.TypeCompliance`, which isolates type errors from broader schema issues.

What Is Type Compliance?

Type compliance is an LLM-evaluation metric that checks whether structured model or agent output uses the expected data type for each field. In an eval pipeline, it catches cases such as "total": "42.00" when downstream code expects a number, or "approved": "true" when a boolean is required. FutureAGI uses TypeCompliance to separate type mismatches from broader schema failures, so engineers can debug structured-output contracts before traces reach tools, databases, or workflow state.

Why Type Compliance Matters in Production LLM and Agent Systems

Type errors are small contract breaks that create large production failures. A claims agent may choose the correct refund tool but pass "amount": "18.50" as a string. A triage workflow may emit "urgent": "false" instead of false, which a rules engine treats as truthy. A RAG extractor may return citations as a single object when the API expects an array. None of those outputs are hallucinations. They are typed boundaries breaking under natural language generation.

Developers feel the pain first through Pydantic, Zod, JSON Schema, or database errors. SREs see retries, dead-letter queues, 400 and 422 responses, and p99 latency spikes when the app asks the model to repair the same payload. Product teams see lower task-completion rates because the agent picked the right next step but could not hand off valid state. Compliance teams lose confidence in audit trails when fields such as consent flags, policy IDs, or risk scores arrive as loose text.

The risk grows in 2026-era agent pipelines because typed output is no longer a final answer. It is often the input to tool calls, routing policies, memory writes, and post-processing jobs. One type mismatch can contaminate a multi-step trace while every step still looks plausible in a chat transcript.

How FutureAGI Handles Type Compliance

FutureAGI’s approach is to evaluate type compliance as a narrow contract signal, not as a catch-all structured-output score. The eval:TypeCompliance surface maps to the TypeCompliance class in fi.evals. That evaluator checks whether output values match expected types while ignoring extra fields and broader constraints. If the contract says order_total is a number, line_items is an array, and requires_review is a boolean, TypeCompliance answers only the type question.

A practical workflow starts with an extraction or tool-calling dataset. An engineer stores expected output shapes beside golden examples, then runs TypeCompliance before a prompt, model, or parser change ships. If type pass rate falls below 0.99 for order_total, the release is blocked even if the model’s natural-language explanation still looks accurate. When the same team also cares about missing fields or enum rules, they add SchemaCompliance or StructuredOutputScore as separate gates instead of hiding every failure under one aggregate.

In production, FutureAGI can attach the eval result to traces from a LangChain, OpenAI, or MCP workflow and group failures by field, prompt version, model, and route. Unlike a plain Pydantic or Zod exception, the metric says what changed: a field drifted from number to string, an array collapsed to an object, or nullable state became free-form text. The next action is concrete: alert the owner, retry with the validator error, route through Agent Command Center model fallback, or open a regression eval for that schema version.

How to Measure or Detect Type Compliance

Measure type compliance as a field-level rate, then roll it up only after the failing fields are visible:

fi.evals.TypeCompliance — checks whether response fields match expected JSON-like types while ignoring extra fields and value constraints.
fi.evals.SchemaCompliance — use this when the same gate must also check required fields, nested structure, and constraints.
Type-fail-rate by field — dashboard failures for fields such as amount, citations, tool_args, and approval_required.
Trace cohort breakdowns — group failures by model, prompt version, route, customer cohort, and release window.
Downstream rejection rate — compare evaluator failures with parser exceptions, 400 responses, and queue retries.

Minimal Python:

from fi.evals import TypeCompliance

metric = TypeCompliance()
result = metric.evaluate([{
    "response": {"amount": "42.00", "approved": "true"},
    "schema": {"amount": "number", "approved": "boolean"},
}])
print(result.eval_results[0].output)

Common Mistakes

Most type-compliance failures come from treating JSON shape as proof that the values are usable.

Counting parseable JSON as typed output. Valid JSON can still encode numbers, booleans, and arrays as strings.
Mixing type failures with missing-field failures. Keep TypeCompliance separate from FieldCompleteness so owners know what changed.
Ignoring provider-native tool calling. Function calling lowers syntax risk, but argument types can still drift after prompt or model changes.
Repairing without field-specific errors. A retry prompt that says “fix JSON” is weaker than naming the exact field and expected type.
Averaging away high-risk fields. One mistyped approved or risk_score field can matter more than ten harmless metadata fields.