Research

What is an MCP Server? Architecture, Transports, and Primitives in 2026

An MCP server exposes tools, resources, and prompts to LLM clients via the Model Context Protocol. Architecture, transports (stdio, SSE, streamable HTTP), and lifecycle in 2026.

March 6, 2025

11 min read

mcp-server model-context-protocol mcp ai-agents tool-use agent-tools anthropic 2026

A team’s agent reads tickets from Linear, runs queries against an internal Postgres, looks up runbooks in Confluence, and posts updates to Slack. The first build wires each integration as a custom Python tool with hand-coded auth, retry, and schema. Six months later the team adds a second agent (an oncall summarizer) that needs the same Linear, Postgres, Confluence, and Slack access. They are about to copy the four custom tools again when someone points out the obvious: every one of those integrations could be an MCP server, called by both agents (and by the developer’s IDE, and by the team’s Claude Desktop) without per-tool glue. They rewrite the four integrations as MCP servers; the second agent uses them in a day.

This piece walks through what an MCP server is, the three primitives, the transport options, the lifecycle, and how to build one in production in 2026.

TL;DR: What an MCP server is

An MCP server is a process that exposes tools, resources, and prompts to LLM clients via the Model Context Protocol, an open standard Anthropic shipped in November 2024. The protocol uses JSON-RPC 2.0 over two standard transports: stdio (subprocess for local clients) and Streamable HTTP, the modern HTTP transport introduced in the 2025-03-26 spec revision; the original HTTP+SSE transport from 2024-11-05 is legacy / backward-compatible. Servers declare server-side capabilities (tools, resources, prompts, logging) at handshake time; clients declare client-side capabilities (roots, sampling, elicitation) that servers may request. The model then calls server primitives by name. Reference SDKs ship in Python and TypeScript; community SDKs cover Java, C#, Rust, Go, Ruby, and more. The protocol is supported across Claude Desktop, Cursor, Windsurf, ChatGPT, Cline, the OpenAI Agents SDK, Google ADK, Microsoft Agent Framework, and many other clients.

Why MCP exists

Three forces converged.

First, the N-by-M problem. Every agent or LLM client needed bespoke integrations to every tool, every database, every API. Every tool needed bespoke client code in every agent that used it. The matrix of integrations grew faster than any single team could keep up.

Second, the conventions for tool description fragmented. OpenAI’s function-calling JSON shape, Anthropic’s tool-use shape, LangChain’s BaseTool, LlamaIndex’s BaseTool, CrewAI’s tool decorator, all close enough to interoperate but different enough that a tool written for one had to be rewritten for the others.

Third, security and governance got serious. An LLM that can read files, run code, hit APIs, and send emails is a powerful surface; controlling what it can and cannot do is a real production concern. A standard protocol with explicit capabilities, scoped roots, and audit-ready logs is the backbone of the governance story.

MCP is what Anthropic shipped in November 2024 to solve all three. The N-by-M shrinks to N+M (every client speaks MCP, every tool exposes MCP). The tool description shape is standardized at the wire level. The capability exchange and the explicit transport layer make security and audit tractable.

The lifecycle of an MCP exchange

Every MCP session follows the same shape.

Initialize. Client sends initialize with its protocol version and supported capabilities; server responds with the version it supports and its capabilities. If the versions are incompatible, the session aborts.
Capabilities exchange. The server declares server-side capabilities (tools, resources, prompts, logging). The client declares client-side capabilities (roots, sampling, elicitation) that the server may request from the client.
List operations. The client calls tools/list, resources/list, prompts/list to discover what is available. Some servers also support live updates via notifications/tools/list_changed.
Operations. When the model emits a tool call, the client calls tools/call. When the application needs a resource, the client calls resources/read. When the user invokes a prompt, the client calls prompts/get.
Notifications. Bidirectional event stream for log messages, progress updates, list-change announcements, and (for sampling) the server asking the client to call its model.
Shutdown. Client closes the transport. For stdio, the subprocess typically exits with the client; for HTTP / Streamable HTTP, the server is an independent long-lived process that simply ends the session while continuing to serve other clients.

The whole thing is JSON-RPC 2.0 over the chosen transport.

The three core primitives

Tools

Tools are functions the model can call. Each tool has a name, a description, and a JSON-schema for its arguments. The server returns a structured result; the result is appended to the conversation as a tool-result message.

from mcp.server.fastmcp import FastMCP

server = FastMCP("filesystem")

@server.tool()
def read_file(path: str) -> str:
    """Read the contents of a file at the given path."""
    with open(path) as f:
        return f.read()

Tools are model-driven: the model decides when to call them based on the description and the conversation. A good description matters; the model uses it to choose the tool.

Resources

Resources are data the client (or user) can read. Each resource has a URI, a name, optional MIME type, and content. Resources are application-driven: the client surfaces a list of available resources to the user (often as attachable items), and the user picks which to include in the conversation.

@server.resource("file://docs/{path}")
def read_doc(path: str) -> str:
    return load_doc(path)

The distinction matters in clients with rich UI. In Claude Desktop, resources show up as attachable items in the chat composer. In Cursor, they show up as @ mentions. In headless agents, the agent loop iterates over resources programmatically.

Prompts

Prompts are reusable prompt templates the server exposes. Each prompt has a name, a description, optional argument schema, and a template that the server fills with the arguments. They appear as slash commands in the client UI.

@server.prompt()
def code_review(language: str, file_path: str) -> str:
    return f"Review the following {language} file: {file_path}..."

Prompts are user-driven; the user invokes the prompt by name (or by slash-command in the UI), the server returns the rendered messages, the client sends them to the model.

Sampling, roots, and elicitation (client capabilities)

Three client-side capabilities the server can call into when the client supports them.

Sampling. The server requests the client to run an LLM call on its behalf. Useful when the server wants to use the client’s model (and credentials) for a sub-task. The client always retains control; sampling requests are subject to user approval.

Roots. A scoping mechanism for what file system or URL space the client allows the server to access. The client declares root paths at session start; the server is supposed to stay within them.

Elicitation. A newer capability that lets the server prompt the user via the client for additional input mid-call.

Transports

Two standard transports are defined in the current spec, plus a legacy HTTP+SSE transport retained for backward compatibility.

Stdio

Server runs as a subprocess of the client. Communication is JSON-RPC over stdin and stdout. This is the canonical local transport: a few lines of config in Claude Desktop, Cursor, or Cline launches the server and routes messages.

Pros: trivial to set up, no network exposure, the OS handles the lifecycle.

Cons: only works when the client and server are on the same machine.

Streamable HTTP

The modern HTTP transport, introduced in the 2025-03-26 spec revision. The server exposes one MCP endpoint that supports both POST (every client JSON-RPC message is a new POST; the server may reply with JSON or open an SSE stream on the same response) and GET (the client opens a server-to-client SSE stream for notifications). Sessions are tracked via a session id header.

Pros: works across machines, simple to deploy behind a reverse proxy or CDN, supports long-running connections and notifications, integrates with standard HTTP auth.

Cons: more setup than stdio, requires careful session and Origin management.

HTTP+SSE (legacy)

The original HTTP transport from the 2024-11-05 revision. Server runs as a long-lived HTTP service; client opens an SSE stream for server-to-client messages and POSTs to a separate endpoint for client-to-server. Replaced by Streamable HTTP starting 2025-03-26 and now legacy / backward-compatible only; many existing servers still ship the SSE path.

Pros: works across machines.

Cons: dual-endpoint design is more complex than Streamable HTTP. Migration path is well-documented.

Building an MCP server

Reference SDKs ship in Python and TypeScript. The pattern is consistent.

Python (FastMCP-style)

from mcp.server.fastmcp import FastMCP

server = FastMCP("orders")

@server.tool()
def get_order(order_id: str) -> dict:
    """Look up an order by id."""
    return db.fetch_order(order_id)

@server.resource("order://{order_id}")
def order_resource(order_id: str) -> dict:
    return db.fetch_order(order_id)

if __name__ == "__main__":
    server.run()  # stdio by default; pass transport="streamable-http" for HTTP

TypeScript

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({ name: "orders", version: "1.0.0" });

server.tool(
  "get_order",
  "Look up an order by id.",
  { order_id: z.string() },
  async ({ order_id }) => ({ content: [{ type: "text", text: JSON.stringify(await db.fetchOrder(order_id)) }] })
);

await server.connect(new StdioServerTransport());

Both SDKs handle the JSON-RPC plumbing, capability exchange, and primitive-list responses automatically.

Hosting an MCP server

Three patterns.

Local subprocess

Distribute the server as a binary, npm package, or Python package. The client launches it via stdio. The simplest deploy story; no network setup required.

Hosted HTTP

Deploy the server as a long-running HTTP service behind a reverse proxy. Authenticate clients with bearer tokens, OAuth, or whatever your infra uses. Streamable HTTP is the recommended transport. Any host that supports the required Streamable HTTP semantics (POST plus GET for SSE, persistent sessions, sufficient request duration for streaming, auth, and Origin validation) can host MCP; Cloudflare Workers, Vercel, Render, and most container hosts work, but verify SSE and request-timeout limits before relying on streaming.

Embedded in another service

Bundle the MCP server inside an existing service (web app, API gateway, internal tool). The MCP transport is one route; the rest of the service is unchanged. Useful for incremental rollout: existing services start exposing MCP without rearchitecting.

How MCP servers compare with other tool-call mechanisms

A practical map.

OpenAI function calling. Model-side primitive. The LLM emits a function-call JSON; the client routes it. MCP is the wire protocol beneath this. The two compose.
LangChain BaseTool / LlamaIndex BaseTool. In-process Python objects. Lighter weight than MCP, no JSON-RPC overhead, but no cross-process or cross-language story.
CrewAI tool decorator / OpenAI Agents tool decorator. Same as LangChain BaseTool: in-process, framework-specific.
REST APIs / OpenAPI. Generic web APIs. The agent’s tool layer has to bridge the OpenAPI shape to the function-calling shape; MCP servers can wrap an OpenAPI tool and expose it as MCP primitives.
A2A (Agent-to-Agent). Google’s protocol for agent-to-agent communication. Targets agent dispatch, not tool calls. MCP and A2A address different layers; they coexist.

MCP’s distinct posture: cross-client, cross-language, cross-vendor protocol for tool federation. The N+M scaling property is what made it the convergence point.

Production patterns

Three that show up.

1. Internal-tool federation

Each internal API (orders, billing, inventory, CRM) exposes an MCP server. Every agent inside the company subscribes to the relevant servers. New agents get tool access without per-agent integration; new tools become available to all agents without per-agent changes.

2. Developer-IDE workflow

A developer uses Cursor, Cline, or Claude Code with MCP servers for the codebase, the issue tracker, the deploy system, and the runbook. The IDE is the single client; the four servers expose the four tool surfaces. The developer’s questions (“what’s broken on staging?”, “show me the test failures from the last deploy”) route to the right server without manual orchestration.

3. Multi-cloud and partner integrations

A SaaS company exposes an MCP server for its API. Customers’ agents (Claude Desktop users, Cursor users, internal agents) connect to the server. The protocol is the integration; no per-customer SDK is required.

Common mistakes

Conflating MCP servers with HTTP APIs. They are different layers. An MCP server may wrap an HTTP API, but it is not the API itself.
Skipping the capability exchange in custom implementations. Hand-rolling MCP without the proper handshake breaks compatibility with newer clients.
Putting business logic in the tool description. The description is for the model; it should explain when to call the tool, not what the tool does internally.
No structured error handling. MCP errors are JSON-RPC error responses. Returning a non-JSON-RPC string on failure breaks clients.
Long-running tools without progress notifications. A tool that runs for 30 seconds without progress updates feels broken in the UI. Send notifications/progress for long calls.
Ignoring root scoping. A filesystem server that does not honor the client’s roots is a security risk.
Stdio transport behind a network. Stdio is local-only by design. Network-exposed servers must use Streamable HTTP.
No observability on the client or server. Each MCP exchange should be a span. Without spans, debugging cross-server agent failures is grep over logs.

How FutureAGI implements MCP server observability and evaluation

FutureAGI is the production-grade observability and evaluation platform for MCP-connected agents built around the closed reliability loop that other MCP stacks stitch together by hand. The full stack runs on one Apache 2.0 self-hostable plane:

MCP exchange tracing, traceAI (Apache 2.0) instruments the MCP client across Python, TypeScript, Java, and C#; spans carry tool name, arguments, return value, latency, status, and roots; for self-built MCP servers, emit OTel spans from inside the server so the trace spans both sides of the exchange.
Tool-correctness evals, 50+ first-party metrics including Tool Correctness, Argument Correctness, Step Efficiency, Plan Adherence, and Faithfulness attach as span attributes on every MCP call; BYOK lets any LLM serve as the judge at zero platform fee, and turing_flash runs the same rubrics at 50 to 70 ms p95.
Simulation, persona-driven scenarios exercise MCP tool surfaces in pre-prod with the same scorer contract that judges production traces; root scoping and capability handshake regressions catch before live traffic.
Gateway and guardrails, the Agent Command Center fronts 100+ providers with BYOK routing for the model that drives the MCP client; 18+ runtime guardrails enforce policy on the same plane.

Beyond the four axes, FutureAGI also ships six prompt-optimization algorithms that consume failing trajectories as training data. Pricing starts free with a 50 GB tracing tier; Boost is $250 per month, Scale is $750 per month with HIPAA, and Enterprise from $2,000 per month with SOC 2 Type II.

Most teams running MCP-connected agents in production end up running three or four tools alongside MCP: one for traces, one for evals, one for the gateway, one for guardrails. FutureAGI is the recommended pick because tracing, evals, simulation, gateway, and guardrails all live on one self-hostable runtime; the loop closes without stitching.

Sources

Series cross-link

Frequently asked questions

What is an MCP server in plain terms?

An MCP server is a process that exposes tools, resources, and prompts to LLM clients via the Model Context Protocol. The protocol was open-sourced by Anthropic in November 2024 and adopted as a standard across Claude Desktop, Cursor, Windsurf, ChatGPT, Cline, the OpenAI Agents SDK, the Google ADK, the Microsoft Agent Framework, and many more clients in 2025-2026. An MCP server is what your file system, database, code repo, ticketing system, or internal API becomes when you wrap it in the protocol; clients then discover and call its capabilities without bespoke integration code.

How is an MCP server different from a tool API?

A tool API is whatever shape your service speaks; the agent's function-calling layer has to be configured per-tool. An MCP server is a standard wrapper: tools, resources, and prompts are described under JSON-RPC 2.0 with predictable schemas, the server advertises capabilities at handshake time, and any MCP client can consume them without per-server glue. The shift is from N-by-M (every client integrates every tool) to N+M (every client speaks MCP, every tool exposes MCP).

What are the three MCP primitives?

Tools, resources, and prompts. Tools are functions the model can call (read_file, query_db, send_email); they are model-driven. Resources are data the client can read or list (file contents, database rows, API responses); they are application-driven and often shown to the user. Prompts are reusable prompt templates the server provides; they are user-driven (typically surfaced as slash commands in the client UI). Capabilities split by side: servers expose tools, resources, prompts, and logging; clients expose roots, sampling, and elicitation, which servers may request from supporting clients.

What transports does MCP support?

The current spec defines two standard transports. (1) Stdio: server runs as a subprocess of the client, JSON-RPC over stdin/stdout. Most desktop integrations use stdio (Claude Desktop, Cursor, Cline). (2) Streamable HTTP: the modern HTTP transport that uses one MCP endpoint supporting POST (for client requests, returning JSON or an SSE stream) and GET (to open a server-to-client SSE stream). HTTP+SSE from the 2024-11-05 revision is now legacy/backward-compatible after Streamable HTTP was introduced in the 2025-03-26 revision. Choice of transport depends on hosting model: subprocess for local, HTTP for remote.

How does MCP relate to OpenAI function calling and Anthropic tool use?

Function calling and tool use are model-side primitives: the LLM emits a function-call JSON, the client routes it to the right code. MCP is the wire protocol between the client and the tool implementations: it standardizes how tools are discovered, described, and called. The two compose: an agent receives a tool-call from the model, the client routes it to an MCP server, the server runs the actual code. MCP does not replace function calling; it standardizes the runtime layer beneath it.

How do I build an MCP server?

Pick an SDK and start. The reference SDKs are Python (modelcontextprotocol/python-sdk) and TypeScript (modelcontextprotocol/typescript-sdk). Both come with FastMCP-style high-level helpers. Define tools as functions with type hints, resources as URI handlers, prompts as templates. Run with stdio for local development; switch to streamable HTTP for hosted. Test against MCP Inspector or any client. The full lifecycle (initialize, capabilities exchange, tool discovery, tool call, shutdown) is handled by the SDK.

What MCP servers exist as references?

The reference servers repo (modelcontextprotocol/servers) currently ships Everything, Fetch, Filesystem, Git, Memory, Sequential Thinking, and Time as actively maintained references. Older entries such as GitHub, GitLab, SQLite, Postgres, Slack, and Brave Search are now in the archived section or maintained by their respective vendors / community. Cloud providers also ship MCP servers for their products (Cloudflare, Google, Microsoft, Atlassian, and many more), and community catalogs (mcpservers.org, mcp.so) list thousands of servers.

Where does MCP fit in production observability?

Each MCP call (tool invocation, resource read, prompt fetch) is a span in the agent's trace tree. Use OpenInference or traceAI to instrument the MCP client side; the server side can also emit spans via the language SDKs. Treating MCP exchanges as first-class spans is what lets you debug latency, errors, and tool-call correctness across federated tool surfaces. FutureAGI's Agent Command Center renders MCP exchanges as labeled nodes in the trace tree.

View all

Research

What is an Agent Skill? The SKILL.md Primitive Explained for 2026

An agent skill is a folder of instructions, scripts, and resources packaged as a SKILL.md unit. What it is, how skills compose, and how teams use them in 2026.

NVJK Kartik · Nov 27, 2025

10 min

Research

What is the Claude Agent SDK? Anthropic's Agent Loop in 2026

Claude Agent SDK is Anthropic's programmable agent harness for Claude. Python repo MIT-licensed, SDK use governed by Anthropic Commercial Terms; tools, MCP, sessions, observability.

NVJK Kartik · May 26, 2025

9 min

Research

Best MCP Gateways in 2026: 6 Production Layers + Companion Instrumentation

Cloudflare MCP, Bifrost (Maxim), Composio, Smithery, MCP Inspector CLI, and Agent Command Center compared on registration, observability, auth, and OTel.

NVJK Kartik · Feb 22, 2025

12 min