Know about problems
before your users do

Set alert rules on error rates, latency, token usage, LLM failures, and custom eval metrics. Static thresholds or percentage-change detection. Warning and critical levels. Slack and email. Full resolution tracking.

Alert Monitors 7 active
+ Create Monitor
LLM Response Time CRITICAL
p99 > 4200ms 2m ago
Daily Token Usage WARNING
847k / 1M limit 15m ago
Context Adherence Score OK
score > 0.85 5m ago
Error Rate - Function Calling OK
rate < 3% 5m ago
Monthly Token Spend MUTED
% change > 25% -
LLM Response Time · QA-Chatbot project
Resolve Mute
LLM Response Time (ms) - last 7 days
critical: 4000ms warning: 3000ms
Mar 13Mar 14Mar 15Mar 16Mar 17Mar 18Mar 19Mar 20
20 Mar 02:15 Critical - LLM Response Time is 4,247ms (threshold: 4,000ms) Unresolved
19 Mar 18:42 Warning - LLM Response Time is 3,189ms (threshold: 3,000ms) Resolved
18 Mar 09:03 Critical - LLM Response Time is 4,512ms (threshold: 4,000ms) Resolved
Routing: #eng-alerts ops@company.com +2 checks every 5min
Core Features

Alerting built for
AI agent metrics

Create Monitor
Select Metric
LLM Response Time latency
Count of Errors count
Error Rate - Function Calling rate
Daily Token Usage tokens
Evaluation Metric (Custom) eval
11 built-in metrics + any custom evaluation

Monitor error counts, error rates for function calling, LLM API failure rates, span response time, LLM response time, token usage (daily and monthly), error-free session rates, service provider error rates, and any custom evaluation metric you define. Each metric has its own query engine optimized for speed.

View all metrics

Set fixed thresholds (error rate > 5%) or let the system calculate baselines automatically. Percentage-change alerts compute the historical mean and standard deviation over a configurable window, then fire when the current value deviates beyond your configured percentage. Dual-level thresholds - warning and critical - so you can escalate proportionally.

Configure thresholds

Route alerts to up to 5 email recipients and Slack channels simultaneously. Every alert trigger is logged with timestamps, severity level, and the exact metric values that fired. Mark alerts as resolved individually or in bulk, with full audit trail of who resolved what and when.

Set up notifications

Preview alert graphs before creating the monitor - see how your thresholds would have performed over the last 7 days. Mute monitors during maintenance windows without deleting configuration. Every monitor tracks its check frequency, last checked time, and full event log.

Explore alert lifecycle
How It Works

From alert to
resolution in three steps

New Alert Rule Draft
Metric LLM Response Time
Type Static
Critical > 4000ms
Warning > 3000ms

Pick a metric and set thresholds

Choose from 11 built-in metrics or any custom evaluation. Set static thresholds or percentage-change detection with warning and critical levels. Preview the graph to see how your rule would have performed over the last 7 days.

Notifications
#
#eng-alerts
@
ops@company.com
@
cto@company.com
Check frequency: every 5 min

Route to Slack and email

Add up to 5 email recipients and a Slack webhook. Attach custom notes for context. Set the check frequency - alerts evaluate every 5+ minutes against your configured time window.

Alert Log 2 unresolved
4,247ms > 4,000ms 2m ago
3,189ms > 3,000ms 15m ago
Resolved by @nikhil 1h ago

Resolve with full context

Every alert log includes the metric value that triggered, the threshold it crossed, and the time window. Mark alerts resolved individually or in bulk. Full audit trail of who resolved what and when.

Powering teams from
prototype to production

From ambitious startups to global enterprises, teams trust Future AGI to ship AI agents confidently.