What Is a Trust Score?
A composite evaluation metric that aggregates hallucination, safety, fairness, and faithfulness signals into a single bounded number per response or system.
What Is a Trust Score?
A trust score is a composite metric that aggregates multiple evaluator outputs into a single bounded number — usually 0 to 1 — representing the trustworthiness of an AI response or system. It typically rolls up faithfulness, safety, fairness, and explainability signals with explicit weights so each component is auditable. In FutureAGI’s evaluation surface, trust scores are produced by AggregatedMetric over a panel of fi.evals evaluators and stored against a Dataset row or a production trace. The score’s value is downstream of the panel’s design: a trust score is only as honest as the evaluators it aggregates.
Why It Matters in Production LLM and Agent Systems
Engineering and product teams need a single number to gate releases. “Did trust go up or down with this change?” cannot be answered by reading 12 evaluator dashboards in parallel during a deploy review. A trust score is the gating signal — but only if it is constructed transparently, with visible components and per-cohort breakdowns. A black-box trust score is worse than no score at all because it imports false confidence.
The pain shows up in three places. First, the deploy meeting: without a composite, teams default to “accuracy looks fine” and ship regressions in safety. Second, the post-incident review: the team wants to know which dimension dropped, and a single number with no decomposition is useless. Third, audit: a regulator asks “how did you decide this version was safe to release?” and the team needs to point at a documented panel and weights.
For 2026-era agent stacks the use-case sharpens. A trust score per trajectory aggregates step-level evals (tool-selection-accuracy, action-safety, goal-progress) so platform teams can compare agent versions on a single comparable axis while preserving the per-step decomposition for debugging. Without that, multi-step regressions go undetected for weeks. FutureAGI’s perspective: a trust score is a useful summary if and only if you can drill from the score to the failing component in one click.
How FutureAGI Handles Trust Scores
FutureAGI’s approach is to compute trust scores via AggregatedMetric, a first-class evaluator that combines other evaluators with weights you control. The composite stays linked to its components so a low score always points to the failing evaluator.
Concretely: a fintech chatbot team configures AggregatedMetric with weights Groundedness=0.3, ContextRelevance=0.2, ContentSafety=0.2, BiasDetection=0.15, IsHarmfulAdvice=0.15. They run it over a 5,000-row Dataset via Dataset.add_evaluation(). Each row gets a trust score plus the five component scores. The aggregated report shows the global mean trust score, per-cohort breakdown, and the worst-performing component per cohort. The deploy gate is a threshold on the aggregate; the deploy diagnostics are the components.
In production, the same AggregatedMetric runs on sampled traces via traceAI; the eval-fail-rate-by-cohort dashboard tracks aggregate trust over time. Compared to a vendor-supplied black-box “trust score” — common in some compliance products — FutureAGI’s score is auditable: every weight is configurable, every component returns a reason, and the underlying fi.evals panel is documented. That is the difference between a metric that survives an audit and one that does not.
How to Measure or Detect It
Trust scores are computed from a configured panel:
AggregatedMetric: the canonical aggregator; takes a list of evaluators plus weights, returns a 0–1 composite.- Component scores: every member evaluator (
Groundedness,Faithfulness,ContentSafety,BiasDetection,IsHarmfulAdvice) returns its own score and reason. - Per-cohort trust delta: trust score by user segment, language, route, or version.
- Trust-score time series: track aggregate over weekly cohorts to detect drift.
Minimal Python:
from fi.evals import AggregatedMetric, Groundedness, ContentSafety, BiasDetection
trust = AggregatedMetric(
metrics=[Groundedness(), ContentSafety(), BiasDetection()],
weights=[0.5, 0.3, 0.2],
)
result = trust.evaluate(input=q, output=r, context=ctx)
print(result.score, result.components)
If the aggregate drops, the components tell you which evaluator caused it.
Common Mistakes
- Aggregating evaluators that measure conflicting things without weighting. Equal weighting hides genuine trade-offs; pick weights deliberately.
- Using a trust score as the only gate. A 0.85 aggregate can hide a catastrophic 0.3 on one component — always inspect components on regressions.
- Building a trust score from judge-model evaluators only. Judge models share biases; mix in deterministic evaluators (
JSONValidation,EmbeddingSimilarity) to anchor the composite. - Not versioning the panel. Changing weights mid-flight makes time-series comparisons meaningless — version the panel definition.
- Reporting trust without per-cohort splits. A high global average can mask a low score on a high-stakes user segment.
Frequently Asked Questions
What is a trust score?
A trust score is a composite metric aggregating multiple evaluator outputs — hallucination, safety, fairness, faithfulness — into a single bounded number representing the trustworthiness of an AI response.
How is a trust score different from accuracy?
Accuracy is a single dimension. A trust score is a weighted aggregation across reliability, safety, and fairness dimensions, so it captures qualitative properties that accuracy alone cannot.
How do you compute a trust score in FutureAGI?
Use `fi.evals.AggregatedMetric` to combine evaluators like `Groundedness`, `ContentSafety`, and `BiasDetection` with chosen weights; the platform produces a per-row trust score plus the underlying components.