What we're building
Our roadmap is public and community-driven. Vote on features, suggest ideas, and shape the future of AI evaluation.
CrewAI & AutoGen instrumentors
Auto-instrumentation for multi-agent frameworks - trace delegation, handoffs, and per-agent metrics.
Prompt optimization engine
Automated prompt improvement suggestions based on evaluation results and production traces.
Real-time guardrail gateway
Sub-100ms inline guardrails for PII detection, topic enforcement, and hallucination blocking.
Self-hosted deployment (Docker & K8s)
One-command self-hosted deployment with Docker Compose and Helm charts for air-gapped environments.
WebRTC voice agent tracing
End-to-end tracing for browser-based voice agents - ICE metrics, audio pipeline, and STT/TTS latency.
Dataset management & versioning
Version-controlled evaluation datasets with diff views, annotations, and CI/CD integration.
MCP (Model Context Protocol) tracing
Trace tool calls and context flows in MCP-enabled agents.
TypeScript / Node.js SDK
Native TypeScript SDK with the same auto-instrumentation capabilities as the Python SDK.
A/B testing for prompts
Built-in prompt experimentation with traffic splitting, statistical significance, and auto-rollback.
Slack & PagerDuty alert integrations
Push hallucination and latency alerts directly to Slack channels and PagerDuty incidents.
Custom evaluation metrics SDK
Define and register your own evaluation metrics with a simple Python decorator API.
Cost analytics dashboard
Per-agent, per-model token cost tracking with budget alerts and optimization recommendations.
OpenTelemetry-native tracing
Full distributed tracing for LLM calls, tool use, and agent handoffs using OpenTelemetry.
Voice AI Simulator
Automated agent-to-agent voice testing - simulate 1000s of call scenarios without human testers.
LangChain & LlamaIndex auto-instrumentation
Zero-code tracing for LangChain chains and LlamaIndex queries with full context propagation.
Hallucination detection metrics
Built-in factuality, groundedness, and faithfulness scoring for RAG and agent outputs.
Vapi & Retell integration
One-line instrumentation for Vapi and Retell voice agent platforms.
Don't see what you need?
Open a discussion on GitHub. Every feature request gets reviewed by the team.