Live

Public Roadmap

What we're building

Our roadmap is public and community-driven. Vote on features, suggest ideas, and shape the future of AI evaluation.

Suggest a Feature View on GitHub

In Progress

CrewAI & AutoGen instrumentors

Auto-instrumentation for multi-agent frameworks - trace delegation, handoffs, and per-agent metrics.

integrationsframeworks

GitHub

Prompt optimization engine

Automated prompt improvement suggestions based on evaluation results and production traces.

optimizecore

GitHub

Real-time guardrail gateway

Sub-100ms inline guardrails for PII detection, topic enforcement, and hallucination blocking.

guardcore

GitHub

Up Next

Self-hosted deployment (Docker & K8s)

One-command self-hosted deployment with Docker Compose and Helm charts for air-gapped environments.

infraenterprise

GitHub

WebRTC voice agent tracing

End-to-end tracing for browser-based voice agents - ICE metrics, audio pipeline, and STT/TTS latency.

voiceobservability

GitHub

Dataset management & versioning

Version-controlled evaluation datasets with diff views, annotations, and CI/CD integration.

evaluatecore

GitHub

MCP (Model Context Protocol) tracing

Trace tool calls and context flows in MCP-enabled agents.

integrationsobservability

GitHub

Under Consideration

TypeScript / Node.js SDK

Native TypeScript SDK with the same auto-instrumentation capabilities as the Python SDK.

sdkcore

GitHub

A/B testing for prompts

Built-in prompt experimentation with traffic splitting, statistical significance, and auto-rollback.

optimizeexperiment

GitHub

Slack & PagerDuty alert integrations

Push hallucination and latency alerts directly to Slack channels and PagerDuty incidents.

monitorintegrations

GitHub

Custom evaluation metrics SDK

Define and register your own evaluation metrics with a simple Python decorator API.

evaluatesdk

GitHub

Cost analytics dashboard

Per-agent, per-model token cost tracking with budget alerts and optimization recommendations.

monitoranalytics

GitHub

Shipped

Feb 2026

OpenTelemetry-native tracing

Full distributed tracing for LLM calls, tool use, and agent handoffs using OpenTelemetry.

observabilitycore

Jan 2026

Voice AI Simulator

Automated agent-to-agent voice testing - simulate 1000s of call scenarios without human testers.

voicesimulate

Jan 2026

LangChain & LlamaIndex auto-instrumentation

Zero-code tracing for LangChain chains and LlamaIndex queries with full context propagation.

integrationsframeworks

Dec 2025

Hallucination detection metrics

Built-in factuality, groundedness, and faithfulness scoring for RAG and agent outputs.

evaluatecore

Dec 2025

Vapi & Retell integration

One-line instrumentation for Vapi and Retell voice agent platforms.

voiceintegrations

Don't see what you need?

Open a discussion on GitHub. Every feature request gets reviewed by the team.

Request a Feature Browse all discussions