Custom Dashboards, MCP Server, 2FA with Passkeys, and Annotation Queues

Role-based access control

Evaluate New

Annotation Queues

Agent Command Center: rebuilt architecture

Integrations hub

Pricing for 2,373+ models

Falcon AI: skills UI and settings redesign

Chat history popover and feedback affordances

Evaluate Improved

Task description field for optimizers

Multi-breakdown analytics

Monitor Improved

Central evaluation metrics rollups

SDK Improved

Unified tracing across every traceAI language

Enterprise multi-org architecture

Simulate Improved

Outbound calls support

SDK Improved

futureagi-mcp-server tools

Custom Dashboards: Your Metrics, Your Layout

Every team tracks different things. Voice agent teams watch call completion rates and latency percentiles. RAG teams (where the model looks up context from a knowledge base before answering) watch retrieval precision and hallucination scores. Platform teams watch cost per query and provider uptime.

Until now, everyone saw the same default views and exported data to build custom dashboards elsewhere.

What’s new

Drag-and-drop widgets. Line charts, bar charts, scorecards, tables, distribution plots.
Connect to any data in the platform. Evaluation scores, system metrics, cost data, experiment results.
Filter by anything you’ve tagged. Time range, model, agent version, or any metadata dimension.
Workspace-scoped and shareable. Pin for your team, include in weekly reviews, or embed widgets in external docs.
Fast on high-volume data. Aggregations are pre-computed off a streaming pipeline so dashboards stay snappy even on heavy production traffic.

Why it matters

When something moves in the wrong direction, you see it on your dashboard before it becomes a customer complaint.

Who it’s for

Engineering and product teams defining their own agent quality bar, and MLOps teams owning production agent reliability who need visual trend monitoring at scale.

MCP Server: Future AGI in Your IDE

Model Context Protocol (MCP) is a standard that lets AI coding assistants call external tools. The MCP server brings the platform directly into your editor (Cursor, Claude Code, VS Code, Claude Desktop, or Windsurf) so your AI assistant can use Future AGI’s evaluation engine, datasets, experiments, traces, and prompts without leaving your code.

What’s new

Five IDE integrations at launch. Cursor, Claude Code, VS Code, Claude Desktop, Windsurf.
Tools across the platform. Evaluations, datasets, prompts, experiments, simulations, tracing, agents, annotations, optimization, usage, and more, all callable from the assistant.
Skills and stop-generation. MCP connectors with skills and mid-generation cancel.
Permission-scoped. Every MCP call runs with your permissions, not the model’s, and lands in the activity feed under your name, so AI-assisted edits stay auditable.

Practical examples

“Evaluate this function against our production dataset.” → Falls through to the evaluate endpoint; scored results land in your editor.
“Generate synthetic test data for this new feature.” → Uses synthetic data generation to produce inputs matching your schema.
“Pull the latest traces for agent X.” → Fetches them and surfaces the relevant spans (individual steps inside a trace) alongside your code.

Who it’s for

Developers using AI coding assistants who want Future AGI’s evaluation and observability surface available from their editor without context-switching to the web UI.

Two-Factor Authentication + Passkeys

A major security release: full 2FA with passkeys, recovery codes, and a cross-org data-isolation fix that touched the data layer.

What’s new: 2FA and passkeys

TOTP (Time-based One-Time Passwords). Authenticator-app codes (Google Authenticator, 1Password, Authy, etc.) as a 2FA factor.
WebAuthn passkeys. Biometric or hardware-key sign-in as a 2FA factor. First-class support; you can register a passkey as your initial 2FA method without ever setting up TOTP.
Recovery codes. Generated after first 2FA setup. Shown once, usable once each. For passkey-only users, recovery codes can be regenerated with password authentication.
2FA-required middleware. Endpoints that require elevated auth enforce 2FA at the middleware level, not per-view.

What’s new: access control and isolation

Role-based access control (RBAC) at both organization and workspace levels. Four roles: Owner (billing), Admin (workspaces and integrations), Member (create and modify), Viewer (read-only).
Workspace member management. List, update role, and remove endpoints with proper role-based filtering.
Enterprise multi-org architecture. Production-ready multi-org deployments with per-organization data boundaries and workspace-aware enforcement at every platform endpoint, plus end-to-end test coverage across the multi-org lifecycle.
Organization creation and selection views. Users can create new organizations and switch between them cleanly with proper workspace context management.

Why it matters

Compliance reviews for regulated industries (healthcare, finance, legal) require 2FA, RBAC, audit logging, and strict cross-tenant isolation. All four are now at a level that clears those reviews.

Who it’s for

Enterprise teams with procurement and security requirements, compliance officers signing off on AI tooling for regulated workflows, and administrators of multi-team Future AGI deployments.

Annotation Queues: Structured Human Review

The annotation queue ships as a full workflow with per-user progress tracking, inline scoring, and unified scores across every data type.

What’s new

Queues for every surface. Create annotation queues over traces, sessions, datasets, or simulation outputs.
Per-user progress tracking. Each reviewer sees what they’ve completed and what’s pending.
Inline scoring. Score inside the queue item without navigating away.
Unified scores across surfaces. Annotations from different surfaces roll up into one score system.

Why it matters

Automated evaluation plus human review is the quality loop that matures an agent. The queue makes human review systematic: clear assignments, visible progress, consolidated scores.

Who it’s for

Quality assurance (QA) teams running human review at scale, compliance reviewers annotating regulated agent outputs, and domain experts whose judgment feeds back into automated evaluation.

Agent Command Center: Rebuilt Architecture

The gateway powering Agent Command Center got its biggest architecture pass in this update.

What’s new

Consistent state across replicas. Multi-replica gateway deployments now stay consistent under load.
Security and correctness pass. API key redaction, a healthy-status endpoint, and consistent /v1 endpoint behavior.
Provider model routing. Google-format aliasing and clean provider resolution for every multimodal handler.
Explicit workspace isolation across every gateway surface.

Integrations Hub

A single configuration page connects Future AGI to eight external platforms:

Traces → Langfuse, Datadog
Metrics → PostHog, Mixpanel
Alerts → PagerDuty
Exports → S3, Azure Blob Storage, Google Cloud Storage

Each integration configures in under a minute and runs continuously once enabled.