Your agents.
Your metrics. Your boards.
Build custom dashboards with drag-and-drop widgets, 9 chart types, and a visual query builder. Pull from traces, datasets, and simulations. Slice by model, user, project, or any custom attribute. Powered by ClickHouse for sub-second queries on millions of rows.
Dashboards built for
AI agent teams
Build dashboards with line, stacked line, column, stacked column, bar, stacked bar, pie, table, and single-metric KPI cards. Each widget sits on a 12-column responsive grid - resize from quarter-width to full-width, drag to reorder, duplicate with one click.
See chart typesEvery widget can pull data from traces (spans, latency, tokens, cost), datasets (row counts, cell errors, eval scores), or simulations (call metrics, persona breakdowns, success rates). Cross-source eval metrics let you join evaluation scores back to any source. One unified query endpoint powered by ClickHouse.
Explore data sourcesBreak down any metric by project, model, status, provider, service name, span kind, session, user, prompt name, version, label, tags, or custom attributes. Dataset metrics slice by dataset name, column, or annotation template. Simulation metrics slice by scenario, persona gender, age group, location, accent, language, and communication style.
View all dimensionsAggregate with sum, average, median, min, max, count, count distinct, and percentiles (P25, P50, P75, P90, P95, P99). Dataset columns support pass rate, fail rate, pass count, fail count, and true rate for boolean evaluations. Every aggregation runs server-side in ClickHouse for sub-second response on millions of rows.
See aggregation options One board for every
question your team asks
Agent performance overview
Build a board with latency P95, error rate, token cost, and eval scores across all your agents. One screen, real-time, shareable.
Cost breakdown by model and team
Track spend per model, per project, per user. Break down token cost with stacked columns. Spot which agents burn budget and which are efficient.
Eval score trends over time
Plot faithfulness, context adherence, and relevance scores as time-series lines. Spot regression the moment a prompt change or model swap degrades quality.
Simulation performance by persona
Visualize simulation results broken down by persona attributes - gender, age group, accent, communication style. Find where your agent fails specific demographics.
Dataset eval pass rates
Track pass/fail rates across dataset evaluations. Break down by dataset, column, or annotation template. Catch regressions in golden-set performance.
Model comparison board
Compare latency, cost, and quality side-by-side across GPT-4o, Claude, Gemini, and custom models. Use a table widget with model as the breakdown dimension.
From zero to
dashboard in minutes
Create a dashboard and add widgets
Name your dashboard, then add widgets from 9 chart types. Each widget has a visual query builder - pick a metric, set aggregation, add filters, choose breakdowns. Live preview updates as you configure.
Query traces, datasets, and simulations
Every widget can pull from any data source - trace spans, dataset evaluations, or simulation runs. Apply filters (contains, greater than, equals) and break down by model, user, project, or any custom attribute.
Arrange, share, and export
Drag widgets to reorder. Resize from quarter-width to full-width on the 12-column grid. Duplicate widgets to iterate fast. Export any chart as CSV for offline analysis.
Powering teams from
prototype to production
From ambitious startups to global enterprises, teams trust Future AGI to ship AI agents confidently.