Articles

Future AGI MCP Server in 2026: Evaluate LLMs and Manage Datasets from Claude, Cursor, or Any MCP Client

Run Future AGI evaluations, datasets, guardrails, and synthetic data from Claude Desktop or Cursor via MCP. Setup, code, and gotchas for 2026.

·
Updated
·
5 min read
agents integrations
Conversational AI Meets Evaluation Power: Introducing the Future AGI MCP Server
Table of Contents

What MCP is and why this matters

Model Context Protocol (MCP) is an open standard from Anthropic for exposing tools and data to LLM clients (spec, GitHub). It is to AI tooling what the Language Server Protocol was to IDE integrations: one wire format, many clients, many servers. In 2026 many major AI tools and IDE clients support MCP, including Claude Desktop, Cursor, Claude Code, Zed, and Windsurf.

Future AGI’s MCP server makes the full evaluation, observability, guardrails, and synthetic data surface callable from any of those clients in natural language. Connect once, and you can ask Claude Desktop to run a faithfulness eval on your last 1000 traces, or ask Cursor to add traceAI instrumentation to a CrewAI agent, without leaving the IDE.

TL;DR

CapabilityWhat you can do from chat
EvaluationsRun faithfulness, toxicity, context relevance, custom LLM judges on single inputs or stored datasets
DatasetsUpload, evaluate, fetch results, download
GuardrailsApply toxicity, prompt injection, and policy guardrails to apps
Synthetic dataGenerate domain specific datasets by describing them in plain English
ObservabilityHave your IDE add traceAI instrumentation to existing agent code

If you use Claude Desktop, Cursor, or Claude Code: install the Future AGI MCP server once, then everything Future AGI does is callable from your existing chat or IDE workflow.

Why use Future AGI over MCP

The win is workflow fit. Three concrete examples:

  1. You have a CSV of agent outputs and want to know which are unfaithful. Instead of opening the web app, importing the CSV, and clicking through evaluator settings, you ask Cursor: “Upload rag_outputs.csv to Future AGI, run faithfulness and context relevance, then download the results.” Cursor calls the right MCP tools in order.
  2. You are debugging an agent that hallucinates tool calls. Ask Claude Desktop: “Add traceAI instrumentation to my CrewAI agent at agents/research_agent.py using traceai-crewai.” It edits the file, then you can rerun and see traces in the Agent Command Center at /platform/monitor/command-center.
  3. You need a 500 row synthetic dataset of multilingual customer support queries. Ask: “Generate a Future AGI synthetic dataset of 500 multilingual customer support queries focused on refund and shipping issues, balanced across English, Spanish, and Hindi.” The server kicks off generation; you download when ready.

How to set up the Future AGI MCP server

Get your API keys

Sign in at app.futureagi.com and copy the FI_API_KEY and FI_SECRET_KEY from your dashboard. Both are required.

Option A: run from source (good for hacking)

git clone https://github.com/future-agi/futureagi-mcp-server.git
cd futureagi-mcp-server

# Install uv if you do not have it
brew install uv

uv sync

export FI_API_KEY="your_api_key"
export FI_SECRET_KEY="your_secret_key"

python main.py

Option B: configure your MCP client to run it via uvx

{
  "mcpServers": {
    "FutureAGI-MCP": {
      "command": "uvx",
      "args": ["futureagi-mcp-server"],
      "env": {
        "FI_API_KEY": "your_api_key",
        "FI_SECRET_KEY": "your_secret_key"
      }
    }
  }
}

Drop the JSON into the relevant config:

  • Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
  • Cursor: Settings → MCP servers
  • Claude Code: project or user .claude/mcp.json
  • Zed: ~/.config/zed/settings.json (assistant.mcp)
  • Windsurf: Settings → AI → MCP servers

Restart the client and the Future AGI tools appear in the tool picker.

For doc lookup only, the Future AGI documentation also ships its own MCP via Mintlify:

npx @mintlify/mcp@latest add futureagi

The Mintlify tool prompts you to pick which clients to wire up. This is read only and complementary to the main MCP server.

Worked examples

Evaluating a dataset from chat

User prompt to Cursor:

Find rag_chat.csv in this repo, upload it to Future AGI, suggest three evaluations
that match the columns, apply them to the dataset, and report a summary.

The MCP server exposes the dataset upload, evaluator listing, and evaluator attach operations as tools; the client agent (Cursor in this example) chooses which to call in which order. It discovers the file, uploads the dataset, lists the available evaluators, picks faithfulness, context relevance, and groundedness based on column names, attaches them to the dataset, polls for completion, and returns row level scores.

Adding traceAI instrumentation from your IDE

User prompt to Cursor:

Search the Future AGI docs for CrewAI instrumentation and edit
agents/research_agent.py to add a trace provider using traceai-crewai.

With the Future AGI docs MCP from Option C installed (or any docs search the client has access to), Cursor reads the relevant page and proposes a diff that adds the standard register plus FITracer boilerplate plus the framework specific instrumentor.

from fi_instrumentation import register, FITracer
from traceai_crewai import CrewAIInstrumentor

register(project_name="research-agent-prod")
CrewAIInstrumentor().instrument()
tracer = FITracer(__name__)

Same pattern works for traceai-langchain (LangChainInstrumentor), traceai-openai-agents, traceai-llama-index, and traceai-mcp. All are Apache 2.0 (traceAI repo).

Generating synthetic data

User prompt to Claude Desktop:

Generate a Future AGI synthetic dataset of 200 customer support queries
in English and Spanish, half about returns and half about subscription billing,
with one column for the query text, one for the intent label.

The MCP server: confirms columns and counts, calls the generation tool, and returns a dataset id. Later you can download it directly through the MCP server or from the web app.

Applying guardrails

User prompt to Claude Code:

Add Future AGI guardrails for toxicity and prompt injection to my chatbot,
and gate the response if either fires.

The server returns the relevant guardrail tooling and configuration; the client agent (Claude Code in this example) drafts the code change in your editor. Future AGI’s guardrails are documented at docs.futureagi.com/docs/protect.

Future AGI MCP vs other observability MCPs

ServerWhat it coversWhere it fits
Future AGI MCPEval, datasets, guardrails, synthetic data, traceAI instrumentationDefault for end to end LLM and agent workflows in 2026
LangSmith MCPTracing, datasets, run inspectionGood if your stack is heavy LangChain or LangGraph
Langfuse MCPTracing, datasets, scoringOpen source pick for tracing first teams
Phoenix MCPTracing, evaluator runsStrong if you already run Arize Phoenix
Sentry MCPError traces and issuesAdjacent, not an LLM eval surface

Future AGI has the widest surface in the eval and guardrails categories. The others are narrower and reasonable if you already pay for them.

Security and key handling

  • Never put your real FI_SECRET_KEY in a config file you commit to git. Use environment variables or the client’s secret manager.
  • For team setups, run the MCP server behind a proxy so each user authenticates with their own token.
  • Rotate keys at least every 90 days and after any contractor offboarding.
  • Treat MCP tool output as untrusted input to the LLM client, just like any other web call. Apply guardrails on responses that get fed back into the model.

Common gotchas

SymptomCauseFix
Client does not see the toolsRestart needed after config changeFully quit and reopen Claude Desktop, Cursor, etc.
”Authentication failed”Wrong key or swapped key fieldsConfirm FI_API_KEY and FI_SECRET_KEY (do not swap them)
Long generation timeoutSynthetic data jobs run asyncAsk the agent to poll status, do not block on first response
Tool list is emptyServer crashed silentlyRun python main.py in a terminal to see logs
Wrong evaluator suggestedLimited column metadataPass a sample row to the chat and ask the agent to re plan

What is next

Future AGI’s MCP capabilities continue to expand. Roadmap items as of 2026:

  • Prompt management endpoints (create, version, A or B test prompts from chat)
  • Knowledge base backed synthetic data (anchor generation to your own docs)
  • Streaming evaluator results for long datasets
  • Tighter integration with the Agent Command Center for live trace inspection

See the Future AGI docs for the current tool list and the MCP server repo for source.

Frequently asked questions

What is the Future AGI MCP server?
It is a Model Context Protocol server that exposes Future AGI evaluations, datasets, guardrails, observability instrumentation, and synthetic data generation as tools any MCP compatible client can call. In 2026 the common clients are Claude Desktop, Cursor, Claude Code, Zed, and Windsurf. The server uses the user's FI_API_KEY and FI_SECRET_KEY to authenticate against the Future AGI platform.
Which MCP clients work with Future AGI?
Any MCP compatible client. The most common in 2026 are Claude Desktop, Cursor, Claude Code, Zed, and Windsurf. Once configured with the server command and the FI_API_KEY plus FI_SECRET_KEY env vars, the client sees evaluator, dataset, guardrail, and synthetic data tools and can call them from any conversation.
How do I install and run the Future AGI MCP server?
Two common paths. Either clone the repository at github.com/future-agi/futureagi-mcp-server, run uv sync, set FI_API_KEY and FI_SECRET_KEY, and start it; or point your MCP client config at uvx with the futureagi-mcp-server package. Detailed steps are below in the Setup section.
What can I do with Future AGI MCP that I cannot do in the web app?
Move from a UI to a conversation. Ask Claude to load a dataset, pick three appropriate evaluators (faithfulness, groundedness, context relevance), apply them, and summarize results. Ask Cursor to instrument your CrewAI or LangGraph agent with traceAI by editing your code in place. Generate synthetic datasets for a specific tone or domain by describing what you want. All without leaving the IDE.
How does Future AGI MCP compare to LangSmith or Langfuse over MCP?
LangSmith and Langfuse each ship MCP servers focused on tracing and dataset operations. Future AGI MCP adds the full eval surface (turing_flash and turing_large evaluator models, CustomLLMJudge, guardrails, simulation) and links to the Agent Command Center for production observability. Future AGI is the broader surface; the others are narrower but reasonable picks if you already use them.
Is the Future AGI MCP server open source?
Yes. The Future AGI MCP server is published on GitHub. The closely related traceAI instrumentation library and ai-evaluation evaluator library are both Apache 2.0 ([traceAI](https://github.com/future-agi/traceAI), [ai-evaluation](https://github.com/future-agi/ai-evaluation)). You can fork the MCP server, add custom tools, and run it locally or in your own infra.
How do I keep keys safe when using MCP?
Never put your real FI_SECRET_KEY in shared client configs. Use the client's secret manager where available (Cursor reads from environment, Claude Desktop reads from a local JSON config). For team setups, run the MCP server behind a proxy or gateway with per user tokens. Rotate keys at least every 90 days.
Does MCP work with agents in production, or just in IDE chat?
Both. MCP started in IDE and chat clients, but a 2026 production agent can also speak MCP and call Future AGI tools the same way Claude Desktop does. Treat MCP tools as another integration in your tool list, with the same authentication, rate limit, and retry handling you would apply to any external API.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.