Dashboards

Your agents.
Your metrics. Your boards.

Build custom dashboards with drag-and-drop widgets, 9 chart types, and a visual query builder. Pull from traces, datasets, and simulations. Slice by model, user, project, or any custom attribute. Powered by ClickHouse for sub-second queries on millions of rows.

Start for Free Book a Demo

Latency P95 Line Chart

Track response time percentiles across all agents

Current 847ms

Eval Scores KPI Widget

Faithfulness, relevance, context adherence

Pass Rate 94.2%

Cost by Model Stacked Bar

Breakdown spend across providers

Total (7d) $2,847

Agent Performance · 4 widgets

Last 7D + Add Widget

Latency P95 ⋮

847ms

↓ 12% vs last period

Total Cost ⋮

$2,847

↑ 8% vs last period

Eval Scores

Faithfulness Relevance

MonTueWedThuFriSatSun

Tokens by Model ⋮

gpt-4o claude gemini

Top Traces by Cost ⋮

Trace Model Tokens Cost Latency Eval

QA-Chatbot gpt-4o 12,847 $0.38 2.1s 0.92

DocumentSearch claude-sonnet 8,234 $0.24 1.8s 0.88

SQLQueryEngine gpt-4o-mini 3,412 $0.05 0.9s 0.71

Core Features

Dashboards built for
AI agent teams

Add Widget

Chart Type

Line

Column

Pie

Stacked

Table

Metric

Width

1/41/31/2Full

9 chart types in a drag-and-drop grid

Build dashboards with line, stacked line, column, stacked column, bar, stacked bar, pie, table, and single-metric KPI cards. Each widget sits on a 12-column responsive grid - resize from quarter-width to full-width, drag to reorder, duplicate with one click.

See chart types

Traces, datasets, simulations - one query engine

Every widget can pull data from traces (spans, latency, tokens, cost), datasets (row counts, cell errors, eval scores), or simulations (call metrics, persona breakdowns, success rates). Cross-source eval metrics let you join evaluation scores back to any source. One unified query endpoint powered by ClickHouse.

Explore data sources

Slice by any dimension - model, user, session, tag

Break down any metric by project, model, status, provider, service name, span kind, session, user, prompt name, version, label, tags, or custom attributes. Dataset metrics slice by dataset name, column, or annotation template. Simulation metrics slice by scenario, persona gender, age group, location, accent, language, and communication style.

View all dimensions

Sum, avg, percentiles, pass rate - 15+ aggregations

Aggregate with sum, average, median, min, max, count, count distinct, and percentiles (P25, P50, P75, P90, P95, P99). Dataset columns support pass rate, fail rate, pass count, fail count, and true rate for boolean evaluations. Every aggregation runs server-side in ClickHouse for sub-second response on millions of rows.

See aggregation options

Use Cases

One board for every
question your team asks

Agent performance overview

Build a board with latency P95, error rate, token cost, and eval scores across all your agents. One screen, real-time, shareable.

Tools Line Metric Table

Cost breakdown by model and team

Track spend per model, per project, per user. Break down token cost with stacked columns. Spot which agents burn budget and which are efficient.

Tools Stacked Column Pie

Eval score trends over time

Plot faithfulness, context adherence, and relevance scores as time-series lines. Spot regression the moment a prompt change or model swap degrades quality.

Tools Line Stacked Line

Simulation performance by persona

Visualize simulation results broken down by persona attributes - gender, age group, accent, communication style. Find where your agent fails specific demographics.

Tools Bar Table

Dataset eval pass rates

Track pass/fail rates across dataset evaluations. Break down by dataset, column, or annotation template. Catch regressions in golden-set performance.

Tools Stacked Bar Metric

Model comparison board

Compare latency, cost, and quality side-by-side across GPT-4o, Claude, Gemini, and custom models. Use a table widget with model as the breakdown dimension.

Tools Table Column

How It Works

From zero to
dashboard in minutes

New Widget Line

Source Traces

Metric latency

Aggregation P95

Breakdown model

Create a dashboard and add widgets

Name your dashboard, then add widgets from 9 chart types. Each widget has a visual query builder - pick a metric, set aggregation, add filters, choose breakdowns. Live preview updates as you configure.

Filters 3 active

model contains gpt-4o

status = OK

latency > 500ms

Query traces, datasets, and simulations

Every widget can pull from any data source - trace spans, dataset evaluations, or simulation runs. Apply filters (contains, greater than, equals) and break down by model, user, project, or any custom attribute.

Dashboard Live

Latency

Cost

Eval Scores (full-width)

CSV Export · Duplicate · Reorder

Arrange, share, and export

Drag widgets to reorder. Resize from quarter-width to full-width on the 12-column grid. Duplicate widgets to iterate fast. Export any chart as CSV for offline analysis.

Powering teams from
prototype to production

From ambitious startups to global enterprises, teams trust Future AGI to ship AI agents confidently.

Your agents.Your metrics. Your boards.

Dashboards built forAI agent teams

One board for everyquestion your team asks