Every agent failure,
classified and fixable

Agent Compass automatically clusters production failures, classifies them against a 30+ error taxonomy, scores every trace on four axes, and recommends confidence-scored fixes - with zero configuration. Send traces, get answers.

Feed

Last Seen

Track, capture, and resolve errors from one place

All projects
Last 7 days
Search errors...
Error name
Last seen Age Trends Events Users
Verbalization of System Process
Conversation Flow
13s ago 3 days
7 3
Incomplete Answer
Response Quality
13s ago 4 days
12 4
Ignored Instruction
Instruction Adherence
13s ago 4 days
15 5
Scores Factual Grounding 2/5 Privacy And Safety 1/5 Instruction Adherence 2/5 Optimal Plan 2/5
Unsafe Advice Off-Topic Response Ignored Instruction Failure to Acknowledge Awkward Silence +3
Recommendation
Expand the agent's conversational capabilities during reassurance states. Instead of repeating one phrase, the agent should have 3-5 alternative empathetic statements.
Immediate Fix
Add alternative reassurance phrases to the content pool for the 'waiting for emergency services' state.
Excessive Monologuing
Conversation Flow
14s ago 7 days
1.1K 24
Premature Closure
Task Completion
25s ago 4 days
75 8
Ignored Interruption
Interruption Handling
28s ago 2 mo
51K 86
Transcriber Bottleneck
Latency & Responsiveness
37s ago 1 mo
44 6
Response Delay
Latency & Responsiveness
1 min ago 2 mo
3.3K 41
Failure to Acknowledge
Conversation Flow
1 min ago 5 days
2K 18
1 to 9 of 15
Page 1 of 2
Core Features

Not just error tracking -
error intelligence

Error Feed - Clusters
live
ERROR CLUSTERS - LAST 7 DAYS CLUSTER EVENTS TREND STATUS Hallucinated refund policy First seen: 3d ago Last seen: 2m ago Agent: support-bot-v3 · Model: gpt-4o ×47 Ungrounded pricing claim First seen: 5d ago · Last seen: 14m ago ×31 Wrong API endpoint referenced First seen: 12d ago · Last seen: 1h ago ×23 Fabricated customer ID First seen: 1d ago · Last seen: 45m ago ×18 Dropped context mid-session First seen: 8d ago · Last seen: 3h ago ×12 PII leak in response First seen: 2d ago · Last seen: 28m ago ×9 6 clusters · 140 total events · auto-grouped by semantic similarity
Error Taxonomy
30+ types
ERROR TAXONOMY - CLASSIFICATION TREE Factual Grounding 12 Hallucinated Content Ungrounded Summary Wrong Chunk Retrieved Safety & Security 9 PII Leak Token Exposure Biased Output Workflow Gaps 11 Goal Drift Step Disorder Dropped Context EVIDENCE SNIPPET "Our refund policy allows returns within 90 days..." Ground truth: 30-day policy 3 categories · 9 sub-types shown · 30+ total classifications
Scoring - 4 Axes
per-trace
MULTI-AXIS QUALITY SCORES - TRACE #a8f3 Factual Grounding 2.1/5 Privacy & Safety 4.8/5 Instruction Adherence 4.6/5 Optimal Plan 3.4/5 QUALITY DIAMOND Factual Safety Instruct. Plan COMPOSITE SCORE 3.7 /5 Weighted average · 4 axes 4-axis scoring · composite 3.7/5 · per-trace granularity
Recommendations
analyzing
FIX RECOMMENDATIONS - CLUSTER: HALLUCINATED REFUND POLICY Cluster: Hallucinated refund policy 47 events · 3 affected agents · trending ↑ "Our policy allows full refunds within 90 days of purchase{"""} → ground truth: 30 days Immediate Fix confidence 87% Pin ground-truth refund policy doc to retrieval context. Add assertion: refund_period == 30 days. episodic memory semantic memory Long-term Fix confidence 72% Implement structured policy KB with versioned facts. Replace free-text retrieval with typed schema lookups. Reduces hallucination surface by constraining generation context. semantic memory 2 recommendations · immediate + architectural · memory-aware

Traces with the same failure signature are grouped into clusters automatically - "Hallucinated refund policy ×47" instead of 47 individual alerts. Each cluster shows event count, first/last occurrence, and a trend graph so you see whether a problem is growing or shrinking. Click any cluster to drill into individual traces.

See the feed view

Every error is classified against a comprehensive taxonomy - hallucinated content, ungrounded summary, wrong tool chosen, invalid tool params, PII leak, biased output, token exposure, goal drift, dropped context, missing CoT, and 20 more. Each classification includes evidence snippets from the LLM response, root causes, and affected spans.

Explore the taxonomy

Every trace is scored (0–5) on four axes: Factual Grounding (hallucination risk), Privacy & Safety (PII, credential leaks, unsafe advice), Instruction Adherence (format, tone, constraints), and Optimal Plan Execution (tool sequencing, workflow logic). Scores are clickable - drill into the taxonomy metrics that drove each score.

See scoring in action

Agent Compass doesn't just find errors - it recommends fixes. Every error includes an immediate fix (minimal patch to stop the bleeding) and a long-term recommendation (architectural change for a robust solution), both with confidence scores. Uses episodic memory from past runs and semantic memory from error patterns.

See how recommendations work
Use Cases

Every failure classified,
every fix recommended

AGENT OUTPUT The order was placed on Jan 5 and shipped via express mail. Delivery is guaranteed within 24h. Contact support for tracking. EVIDENCE Policy doc: "3-5 business days delivery" chunk #12 policy.pdf:§4 Flagged claim → evidence mismatch detected

Catch hallucinations with evidence

Every hallucinated claim is flagged with the exact words that triggered it, the retrieval chunks that were available, and whether the agent fabricated content or used the wrong chunk. Clustered by topic so you fix the root cause, not individual symptoms.

Hallucination RAG Evidence
TOOL CALL search_orders( user_id: "u-482" query: "refund" wrong tool SUGGESTED get_refund _status() Expected: get_refund_status(order_id: "ord-291") match: 0% Wrong tool selected → correct suggestion surfaced

Debug tool selection and parameter errors

See when your agent picks the wrong tool, passes invalid parameters, misinterprets tool output, or fails to call a tool it should have used. Each error shows the affected span, the tool call payload, and what the correct action would have been.

Tool Misuse Invalid Params
AGENT RESPONSE Your account details: Name: John Smith SSN: 482-91-3847 REDACTED Email: j.smith@email.com ! PII detected 2 fields SSN, Email Sensitive data detected → auto-flagged for review

Surface PII leaks and security failures

Detect PII exposure, token leaks, credential exposure, insecure API usage, and biased output - classified under the Safety & Security taxonomy. Each incident includes evidence snippets and the exact span where the leak occurred.

PII Token Exposure Bias
Step 1 Verify user Step 2 Check policy SKIPPED Step 3 Issue refund Expected: 1 → 2 → 3 (sequential) Step skipped → workflow integrity violation

Identify workflow and planning failures

Catch goal drift, step disorder, redundant steps, dropped context, and missing chain-of-thought - the subtle failures that don't throw errors but produce wrong answers. Agent Compass detects these through its Workflow & Task Gaps and Reflection Gaps taxonomy categories.

Goal Drift Context Loss
ERROR CLUSTER TREND SEVERITY Hallucination: pricing critical Tool: wrong function high PII exposure: email medium Workflow: step skip low Cluster errors → rank by trend and severity

Prioritize by trend and severity

Each cluster shows event count, trend direction, and first/last occurrence. Errors scored on four axes (grounding, safety, instruction adherence, plan execution) so you fix the highest-impact problems first - not the noisiest ones.

Trends Severity Alerts
ERROR CLUSTER hallucination hallucination hallucination 12 GENERATED DATASET Scenario 1: pricing claim Scenario 2: date accuracy Scenario 3: policy ref + 9 more scenarios... Ready to run as test suite Error patterns → regression test dataset

Turn errors into test cases

Feed production error patterns back into simulation scenarios and evaluation datasets. Agent Compass learns from past runs using episodic and semantic memory - so the same failure pattern gets caught faster next time.

Datasets Simulate Memory
How It Works

From trace to fix
with zero configuration

01

Send traces - zero config required

Agent Compass runs automatically on Observe projects. Send traces via OpenTelemetry or any supported SDK (Google ADK, OpenAI, LangChain, LlamaIndex). Set your sampling rate (1–100%) and Compass starts analyzing immediately - no eval config, no metric setup.

Integration Setup
CONNECT YOUR AGENT OpenTelemetry Google ADK OpenAI LangChain LlamaIndex Agent Compass Trace Ingestion Sampling Rate 100% Zero config STATUS Connected · 1,247 traces ingested $ pip install fi-client && fi init --project my-agent
02

Errors cluster and score automatically

Traces are classified against 30+ error types, grouped into clusters by failure signature, and scored on four axes. Each cluster shows event count, trend graph, evidence snippets, root causes, and affected spans. New traces join existing clusters or create new ones in real time.

Analysis Pipeline
Processing
CLASSIFY · CLUSTER · SCORE Incoming Traces trace-a91f trace-c34b trace-f72e trace-d08a Classify Taxonomy 30+ error types Cluster Score Severity Frequency Impact CLUSTERED OUTPUT Hallucinated URLs 42 events severity: 8 Context Drift 28 events severity: 6 Tool Call Failures 17 events severity: 5 87 errors across 3 clusters from 1,247 traces Pipeline latency: < 200ms per trace
03

Apply fixes with confidence scores

Every error includes an immediate fix and a long-term recommendation, both with confidence scores. Drill into any trace to see the full execution - input, retrieval, generation, tool calls - and pinpoint the exact span where things went wrong. Feed patterns back into simulations to verify the fix.

Fix Workflow
ERROR CLUSTER DETAIL Hallucinated URLs 42 events severity: 8 TRACE SPAN WATERFALL input 12ms retrieval 45ms generation 320ms - hallucination detected here tool_call 28ms RECOMMENDED FIXES Immediate Fix Add URL validation guardrail to generation output confidence 85% Long-term Fix Fine-tune retrieval embeddings with verified URL corpus confidence 72% Feed back to Simulate →

Powering teams from
prototype to production

From ambitious startups to global enterprises, teams trust Future AGI to ship AI agents confidently.