Live
Public Roadmap

What we're building

Our roadmap is public and community-driven. Vote on features, suggest ideas, and shape the future of AI evaluation.

In Progress
3

CrewAI & AutoGen instrumentors

Auto-instrumentation for multi-agent frameworks - trace delegation, handoffs, and per-agent metrics.

integrationsframeworks
GitHub

Prompt optimization engine

Automated prompt improvement suggestions based on evaluation results and production traces.

optimizecore
GitHub

Real-time guardrail gateway

Sub-100ms inline guardrails for PII detection, topic enforcement, and hallucination blocking.

guardcore
GitHub
Up Next
4

Self-hosted deployment (Docker & K8s)

One-command self-hosted deployment with Docker Compose and Helm charts for air-gapped environments.

infraenterprise
GitHub

WebRTC voice agent tracing

End-to-end tracing for browser-based voice agents - ICE metrics, audio pipeline, and STT/TTS latency.

voiceobservability
GitHub

Dataset management & versioning

Version-controlled evaluation datasets with diff views, annotations, and CI/CD integration.

evaluatecore
GitHub

MCP (Model Context Protocol) tracing

Trace tool calls and context flows in MCP-enabled agents.

integrationsobservability
GitHub
Under Consideration
5

TypeScript / Node.js SDK

Native TypeScript SDK with the same auto-instrumentation capabilities as the Python SDK.

sdkcore
GitHub

A/B testing for prompts

Built-in prompt experimentation with traffic splitting, statistical significance, and auto-rollback.

optimizeexperiment
GitHub

Slack & PagerDuty alert integrations

Push hallucination and latency alerts directly to Slack channels and PagerDuty incidents.

monitorintegrations
GitHub

Custom evaluation metrics SDK

Define and register your own evaluation metrics with a simple Python decorator API.

evaluatesdk
GitHub

Cost analytics dashboard

Per-agent, per-model token cost tracking with budget alerts and optimization recommendations.

monitoranalytics
GitHub
Shipped
5
Feb 2026

OpenTelemetry-native tracing

Full distributed tracing for LLM calls, tool use, and agent handoffs using OpenTelemetry.

observabilitycore
Jan 2026

Voice AI Simulator

Automated agent-to-agent voice testing - simulate 1000s of call scenarios without human testers.

voicesimulate
Jan 2026

LangChain & LlamaIndex auto-instrumentation

Zero-code tracing for LangChain chains and LlamaIndex queries with full context propagation.

integrationsframeworks
Dec 2025

Hallucination detection metrics

Built-in factuality, groundedness, and faithfulness scoring for RAG and agent outputs.

evaluatecore
Dec 2025

Vapi & Retell integration

One-line instrumentation for Vapi and Retell voice agent platforms.

voiceintegrations

Don't see what you need?

Open a discussion on GitHub. Every feature request gets reviewed by the team.