Home / Changelog / 2026 Week 12
2026 W12
Share

Custom Dashboards, MCP Server, 2FA with Passkeys, and Annotation Queues

A drag-and-drop dashboard builder, a Model Context Protocol (MCP) server that puts Future AGI inside your IDE, two-factor authentication with passkeys + recovery codes, full Annotation Queue workflows, and a rebuilt Agent Command Center.

Platform SDK Evaluate Monitor Simulate
5 IDE integrations (MCP)
2,373+ LLM models priced out of the box
8 external platform integrations

Custom Dashboards: Your Metrics, Your Layout

Every team tracks different things. Voice agent teams watch call completion rates and latency percentiles. RAG teams (where the model looks up context from a knowledge base before answering) watch retrieval precision and hallucination scores. Platform teams watch cost per query and provider uptime.

Until now, everyone saw the same default views and exported data to build custom dashboards elsewhere.

What’s new

  • Drag-and-drop widgets. Line charts, bar charts, scorecards, tables, distribution plots.
  • Connect to any data in the platform. Evaluation scores, system metrics, cost data, experiment results.
  • Filter by anything you’ve tagged. Time range, model, agent version, or any metadata dimension.
  • Workspace-scoped and shareable. Pin for your team, include in weekly reviews, or embed widgets in external docs.
  • Fast on high-volume data. Aggregations are pre-computed off a streaming pipeline so dashboards stay snappy even on heavy production traffic.

Why it matters

When something moves in the wrong direction, you see it on your dashboard before it becomes a customer complaint.

Who it’s for

Engineering and product teams defining their own agent quality bar, and MLOps teams owning production agent reliability who need visual trend monitoring at scale.

Read the docs →

MCP Server: Future AGI in Your IDE

Model Context Protocol (MCP) is a standard that lets AI coding assistants call external tools. The MCP server brings the platform directly into your editor (Cursor, Claude Code, VS Code, Claude Desktop, or Windsurf) so your AI assistant can use Future AGI’s evaluation engine, datasets, experiments, traces, and prompts without leaving your code.

What’s new

  • Five IDE integrations at launch. Cursor, Claude Code, VS Code, Claude Desktop, Windsurf.
  • Tools across the platform. Evaluations, datasets, prompts, experiments, simulations, tracing, agents, annotations, optimization, usage, and more, all callable from the assistant.
  • Skills and stop-generation. MCP connectors with skills and mid-generation cancel.
  • Permission-scoped. Every MCP call runs with your permissions, not the model’s, and lands in the activity feed under your name, so AI-assisted edits stay auditable.

Practical examples

  • “Evaluate this function against our production dataset.” → Falls through to the evaluate endpoint; scored results land in your editor.
  • “Generate synthetic test data for this new feature.” → Uses synthetic data generation to produce inputs matching your schema.
  • “Pull the latest traces for agent X.” → Fetches them and surfaces the relevant spans (individual steps inside a trace) alongside your code.

Who it’s for

Developers using AI coding assistants who want Future AGI’s evaluation and observability surface available from their editor without context-switching to the web UI.

Two-Factor Authentication + Passkeys

A major security release: full 2FA with passkeys, recovery codes, and a cross-org data-isolation fix that touched the data layer.

What’s new: 2FA and passkeys

  • TOTP (Time-based One-Time Passwords). Authenticator-app codes (Google Authenticator, 1Password, Authy, etc.) as a 2FA factor.
  • WebAuthn passkeys. Biometric or hardware-key sign-in as a 2FA factor. First-class support; you can register a passkey as your initial 2FA method without ever setting up TOTP.
  • Recovery codes. Generated after first 2FA setup. Shown once, usable once each. For passkey-only users, recovery codes can be regenerated with password authentication.
  • 2FA-required middleware. Endpoints that require elevated auth enforce 2FA at the middleware level, not per-view.

What’s new: access control and isolation

  • Role-based access control (RBAC) at both organization and workspace levels. Four roles: Owner (billing), Admin (workspaces and integrations), Member (create and modify), Viewer (read-only).
  • Workspace member management. List, update role, and remove endpoints with proper role-based filtering.
  • Enterprise multi-org architecture. Production-ready multi-org deployments with per-organization data boundaries and workspace-aware enforcement at every platform endpoint, plus end-to-end test coverage across the multi-org lifecycle.
  • Organization creation and selection views. Users can create new organizations and switch between them cleanly with proper workspace context management.

Why it matters

Compliance reviews for regulated industries (healthcare, finance, legal) require 2FA, RBAC, audit logging, and strict cross-tenant isolation. All four are now at a level that clears those reviews.

Who it’s for

Enterprise teams with procurement and security requirements, compliance officers signing off on AI tooling for regulated workflows, and administrators of multi-team Future AGI deployments.

Read the docs →

Annotation Queues: Structured Human Review

The annotation queue ships as a full workflow with per-user progress tracking, inline scoring, and unified scores across every data type.

What’s new

  • Queues for every surface. Create annotation queues over traces, sessions, datasets, or simulation outputs.
  • Per-user progress tracking. Each reviewer sees what they’ve completed and what’s pending.
  • Inline scoring. Score inside the queue item without navigating away.
  • Unified scores across surfaces. Annotations from different surfaces roll up into one score system.

Why it matters

Automated evaluation plus human review is the quality loop that matures an agent. The queue makes human review systematic: clear assignments, visible progress, consolidated scores.

Who it’s for

Quality assurance (QA) teams running human review at scale, compliance reviewers annotating regulated agent outputs, and domain experts whose judgment feeds back into automated evaluation.

Read the docs →

Agent Command Center: Rebuilt Architecture

The gateway powering Agent Command Center got its biggest architecture pass in this update.

What’s new

  • Consistent state across replicas. Multi-replica gateway deployments now stay consistent under load.
  • Security and correctness pass. API key redaction, a healthy-status endpoint, and consistent /v1 endpoint behavior.
  • Provider model routing. Google-format aliasing and clean provider resolution for every multimodal handler.
  • Explicit workspace isolation across every gateway surface.

Read the docs →

Integrations Hub

A single configuration page connects Future AGI to eight external platforms:

  • Traces → Langfuse, Datadog
  • Metrics → PostHog, Mixpanel
  • Alerts → PagerDuty
  • Exports → S3, Azure Blob Storage, Google Cloud Storage

Each integration configures in under a minute and runs continuously once enabled.

Read the docs →

Platform and Pricing

Pricing for 2,373+ models. Supported-model pricing grows from roughly 200 to over 2,373. User-provided cost attributes are checked first before falling back to token-based calculation.

Unified tracing across every traceAI language. traceAI now emits a single standardized span shape and attribute set across all four supported languages (Python, TypeScript, Java, C#), so traces from any part of your stack share one consistent shape end to end. A bundled migration utility upgrades historical traces in place.

Central evaluation metrics rollups. Evaluation metrics now feed into one analytics rollup pipeline shared across surfaces, so a metric’s value matches everywhere it appears (dashboards, the eval explorer, API). Aggregations on large eval datasets stay fast because the rollup is precomputed.

Multi-breakdown analytics. Slice the same widget by multiple dimensions at once, for example by model, agent version, and time period together.

Falcon AI: skills UI and settings redesign. Falcon AI gets a redesigned skills and settings surface, plus a working stop-generation control for cancelling long responses cleanly.

Chat history popover with feedback. Thumbs up/down feedback and tool-call persistence across sessions.

Task description field for optimizers. Every optimizer type now accepts a task description (optimization objective) input.

Outbound calls backend support. More provider-side call states tracked and more failure modes surfaced cleanly for outbound voice simulation, building on the outbound calling flow already in Simulate.