What Is a Predicate Function?
A function that returns a boolean true or false for its input, used as a condition, filter, or deterministic evaluator.
What Is a Predicate Function?
A predicate function is a function that returns a boolean — true or false — for its input. It expresses a condition: is_even(n), contains_pii(text), is_valid_json(s). In programming and formal logic, predicates are the building blocks of filters, conditions, and rule systems. In LLM evaluation, they are the simplest possible evaluators — wrap a deterministic check, return pass or fail, no model call required. The output is a single bit, but that bit is reproducible, fast, and free of judge-model variance.
Why It Matters in Production LLM and Agent Systems
Predicate evaluators are the foundation of any cost-efficient LLM eval suite. Every project starts wanting to grade with a fancy judge model, then realises that judge-models cost money, vary per call, and are wrong on structural questions a regex would nail. The right answer is a predicate-first hierarchy: deterministic checks at the front, judge models only for the subjective leftovers.
The pain shows up across roles. A platform engineer pays $14k/month grading JSON validity with a GPT-class judge model when IsJson would do it for free. A developer ships a tool-calling agent without a function-call-accuracy predicate; the eval suite never catches the agent inventing a tool name. A product lead reviews “evaluation results” that are 80% judge-model variance and 20% real signal — predicates would have given a stable baseline. A compliance auditor asks for a deterministic record of pass/fail decisions and gets a judge-model log full of “score: 0.73” rationales.
For 2026 agent stacks, predicates are the cheapest layer of trajectory evaluation: did the planner output well-formed JSON, did the tool name match the schema, did the response contain a required citation, did the structured-output field meet the type spec. These checks run on every span and feed the next layer of (more expensive) evaluators.
How FutureAGI Ships Predicate Evaluators
FutureAGI’s approach is to expose predicate evaluators as a first-class layer, alongside judge-model and embedding-based evaluators. The library includes Contains, EndsWith, Equals, IsEmail, IsJson, LengthBetween, LengthGreaterThan, LengthLessThan, JSONSyntaxOnly, JSONValidation, JsonSchema, ContainsValidLink, ContainsCode. Each returns a boolean (or 0/1 score) and a reason. They are deterministic, free, and run in microseconds.
For domain-specific predicates, the CustomEvaluation decorator wraps any boolean-returning Python function as an evaluator that integrates with Dataset.add_evaluation() and traceAI span events. A team can define is_valid_invoice_id(s) and use it in the same pipeline as Groundedness.
Concretely: a billing-support agent runs a five-evaluator suite per response. JSONValidation (predicate) checks the structured output — fails the 0.4% of responses with malformed JSON, no judge model needed. Contains (predicate) verifies the required disclosure string is present. IsCompliant (judge) handles policy-adherence semantics. HallucinationScore (model-graded) handles factuality. The cost per eval drops 60% by running the predicates first and short-circuiting on failure — judge models only run for responses that passed the cheap checks.
How to Measure or Detect It
Predicate evaluators produce binary signals plus a reason:
Contains: returns true if the response text contains the expected substring.Equals: returns true if the response exactly matches the expected text (whitespace-normalised).IsJson: returns true if the response is parseable JSON.JSONValidation: returns true if the response validates against a provided JSON schema.- Pass-rate dashboard: percentage of evaluated traces passing the predicate; the canonical alarm.
CustomEvaluation-decorated function: wrap any boolean-returning Python function for domain-specific checks.
from fi.evals import IsJson, JSONValidation
is_json = IsJson()
schema_check = JSONValidation(schema=invoice_schema)
r1 = is_json.evaluate(output=response_text)
r2 = schema_check.evaluate(output=response_text)
print(r1.score, r2.score, r2.reason)
Common Mistakes
- Using judge models where a predicate would do. Format, length, regex, schema — these are predicate jobs.
- Over-strict predicates. A predicate that fires on every minor whitespace change produces noise; normalise inputs first.
- No reason in the predicate output. A boolean without an explanation is hard to debug; FutureAGI predicates return both.
- Skipping the cheap layer. Running judge models on responses that fail the predicate is wasted spend; short-circuit.
- Mixing predicate and judge scores into one number. Aggregating a 0/1 with a 0–1 confidence loses information; report each separately.
- Forgetting timezone or locale in regex predicates. Date-format checks that ignore locale silently fail on non-US users; encode the rule explicitly.
- Skipping the predicate audit log. Predicate fires are deterministic and cheap to log; persist the decision per trace so audits can replay a release without re-running judges.
Frequently Asked Questions
What is a predicate function?
A predicate function is a function that returns a boolean — true or false — for its input. It expresses a condition and is a primary building block of filters, evaluators, and rule-based logic.
How are predicate functions used in LLM evaluation?
Predicates wrap deterministic checks — substring presence, regex match, schema validity, exact equality — into pass/fail evaluators. They are the cheapest, fastest layer of an LLM eval suite.
Are predicates better than judge-model evaluators?
They are different tools. Predicates handle deterministic structural checks (JSON validity, format, contains-keyword) cheaply and reproducibly; judge models handle subjective quality (tone, helpfulness). Use both.