Research

What is the Claude Agent SDK? Anthropic's Agent Loop in 2026

Claude Agent SDK is Anthropic's programmable agent harness for Claude. Python repo MIT-licensed, SDK use governed by Anthropic Commercial Terms; tools, MCP, sessions, observability.

·
9 min read
claude-agent-sdk anthropic agent-framework claude agent-loop mcp python open-source 2026
Editorial cover image on a pure black starfield background with faint white grid. Bold all-caps white headline WHAT IS CLAUDE AGENT SDK fills the left half. The right half shows a wireframe agent loop with three connected boxes LLM TOOLS RESULT in a closed cycle, with the TOOLS box surrounded by a soft white halo, drawn in pure white outlines.
Table of Contents

A team is building an internal data-analysis agent that reads CSV files from a shared drive, runs Python analysis, queries a Postgres warehouse, and produces a summary. The first version is hand-rolled around the Anthropic Messages API: a loop that parses tool_use blocks, dispatches Python execution, and threads results back into the next turn. It works, but the loop, the streaming logic, and the cache management are tedious to maintain. The team rewrites it on the Claude Agent SDK: one Agent with three tools (file_read, python_run, postgres_query) and an MCP server for the warehouse. The SDK runs the loop, handles the streaming, manages prompt caching, and emits traces.

This is the niche the Claude Agent SDK fills. Where every Anthropic-first team used to roll their own agent loop, the SDK collapsed that work into a small library. This guide covers what the Claude Agent SDK is, the primitives, how it compares to alternatives, and when to pick it.

TL;DR: What the Claude Agent SDK is

The Claude Agent SDK is Anthropic’s programmable agent harness, packaged as a Python SDK and a TypeScript SDK. The Python SDK at github.com/anthropics/claude-agent-sdk-python is MIT-licensed; the TypeScript SDK at github.com/anthropics/claude-agent-sdk-typescript is publicly available but its use is governed by Anthropic Commercial Terms rather than MIT. Across both, SDK use is governed by Anthropic Commercial Terms except where a component dependency has its own license. The Python repo has approximately 6,400 GitHub stars as of April 2026. The framework runs the Claude Code CLI as a child process and exposes agent-loop primitives on top: tools, MCP servers, sessions, prompt caching, and a permissions model. Anthropic’s Computer Use lives outside the SDK as a Messages API beta tool that applications wire in through custom tools or MCP. The SDK is Anthropic’s recommended path for Claude-first agent applications.

Why the Claude Agent SDK matters in 2026

Three forces made it the default choice for Anthropic-first stacks.

First, MCP became the standard for tool integration. The Model Context Protocol spec stabilized in 2025 and the ecosystem of MCP servers (file system, GitHub, Slack, Postgres, Notion, browser, etc.) grew through 2026. The Claude Agent SDK’s first-class MCP support means you can connect a Claude agent to dozens of external systems without writing custom Python wrappers.

Second, Computer Use kept iterating through the Claude 4.x line. Anthropic still ships Computer Use as a Messages API beta capability behind a beta header on supported Claude 4.x models. It is not a built-in Agent SDK tool; teams that need GUI control wire Computer Use into an agent through custom tooling or MCP, supplying their own sandbox and execution loop, and run it in isolated pilots with human approval and domain-specific reliability tests.

Third, prompt caching made multi-turn agents cheaper. The Messages API cache primitive lets you reuse the system prompt and tool definitions at a discounted rate (currently 0.1x of the input price for cache reads). For long-running agent sessions with stable system prompts, the savings are substantial. The SDK handles cache management automatically.

The anatomy of a Claude Agent SDK application

The framework’s primitives are small and Anthropic-native.

Agent loop. The core runner. The SDK launches the Claude Code CLI as a child process; the CLI calls the Anthropic Messages API, parses tool_use blocks, dispatches tool calls in parallel where possible, threads results into the next turn, and continues until the model emits an end_turn stop reason without further tool calls. From the application’s perspective you write to and read from the SDK; the loop and telemetry live in the harness.

Tool. A function or capability the agent can invoke. Three flavors: built-in Claude Code tools (including Read, Write, Edit, Bash, Glob, Grep, WebSearch, and WebFetch), MCP server tools (auto-discovered from a connected MCP server), and custom Python tools (a function plus a JSON schema). Computer Use is a separate Messages API beta tool that you can wire into the SDK only via the custom tool or MCP paths if your workflow needs desktop control; the application supplies the sandbox and execution loop.

MCP server. An external server speaking the Model Context Protocol. The SDK auto-discovers the server’s tools and exposes them to the agent. MCP servers can be local processes (started by the SDK), remote HTTP endpoints, or stdio-based.

Computer Use. Anthropic’s screen-and-input control tool, exposed as a Messages API beta capability rather than a built-in Agent SDK tool. The agent can take screenshots, click pixels, drag, type, press keys, and scroll, but Agent SDK applications integrate Computer Use through custom tooling or MCP, and the application supplies the sandbox and execution loop.

Prompt caching. A primitive on the Messages API. The SDK marks system prompt and tool definitions as cache_control breakpoints; subsequent turns within the cache TTL pay the cache-read rate instead of the full input rate.

Session. A persistent conversation object. Sessions retain message history and tool call history across SDK calls so multi-turn conversations work without manually threading history. Each query() call returns its own total_cost_usd; if you want a session-level cost rollup, accumulate the per-call totals in your application code.

Permissions. An approval model for tool calls. You can configure the SDK to require explicit approval before any tool call, before specific tool calls, or never. Approvals can be auto-granted, denied, or routed to a human via a callback.

Claude Agent SDK in 30 lines

import json
import anyio
from claude_agent_sdk import ClaudeAgentOptions, query, tool, create_sdk_mcp_server

@tool("get_invoice", "Return invoice details by id.", {"invoice_id": str})
async def get_invoice(args: dict) -> dict:
    invoice = {"id": args["invoice_id"], "amount": 142.30, "status": "paid"}
    # Custom tools must return a tool result with content blocks.
    return {"content": [{"type": "text", "text": json.dumps(invoice)}]}

invoice_server = create_sdk_mcp_server(name="invoices", version="1.0.0", tools=[get_invoice])

async def main():
    options = ClaudeAgentOptions(
        mcp_servers={"invoices": invoice_server},
        allowed_tools=["mcp__invoices__get_invoice"],
        permission_mode="acceptEdits",
    )
    async for message in query(
        prompt="What was invoice INV-921's amount?",
        options=options,
    ):
        print(message)

anyio.run(main)

The SDK starts the in-process MCP server, connects to it, exposes get_invoice as a tool, runs the agent loop, prints each message as it streams.

How the Claude Agent SDK compares to alternatives

FrameworkLead withBest forLicense
Claude Agent SDKAnthropic-native loop with MCP and Computer UseClaude-first stacks, MCP-heavy integrations, GUI automationPython SDK MIT; SDK use governed by Anthropic Commercial Terms
OpenAI Agents SDKOpenAI-led, provider-agnostic agent loop with handoffs, guardrails, sessions, tracing, HITLOpenAI-first stacks; single- or multi-agent workflowsMIT
CrewAIRole + task + crewRole-decomposable pipelinesMIT
LangGraphStateful graphArbitrary state machines, persistenceMIT
Pydantic AIType-safe agentsValidated outputs, multi-provider stacksMIT

The Claude Agent SDK and OpenAI Agents SDK occupy the same conceptual tier: programmable agent-loop frameworks from a model lab. The Claude Agent SDK is Claude-first by design; the OpenAI Agents SDK is OpenAI-led but provider-agnostic and supports multi-agent workflows out of the box. The choice is which provider you primarily target. The third-party frameworks (CrewAI, LangGraph, Pydantic AI) are provider-neutral and stronger for role-based or complex-state workflows.

Production patterns with the Claude Agent SDK

Three patterns recur.

Pattern 1: MCP-server-as-tool-gateway. A single agent connects to several MCP servers (a database server, a file system server, a GitHub server). The agent’s tool surface comes entirely from MCP. This is the Claude-native way to integrate enterprise systems without writing per-tool Python.

Pattern 2: Computer Use agent in a sandboxed VM. An agent integrates the Computer Use Messages API beta capability through custom tooling or MCP and runs inside a Docker container with X11 and a virtual display. The agent automates browser tasks, fills forms, scrapes UIs, or runs desktop applications. Your application supplies the screenshot/action loop and the container provides the sandbox; the Agent SDK orchestrates surrounding tools, sessions, and tracing.

Pattern 3: Long-running session with prompt caching. An agent with a 4,000-token system prompt and 30 tool definitions. The first turn pays full input cost; subsequent turns within the 5-minute cache TTL pay the cache-read rate. For high-frequency multi-turn workflows the savings compound. The SDK handles cache breakpoint placement automatically.

Common mistakes when adopting the Claude Agent SDK

  • Skipping the permissions model. A agent that can call arbitrary tools without approval is a real safety hazard. Configure the permission_mode and a callback for any irreversible tools (file writes, money movements, infrastructure changes).
  • Using Computer Use without a sandbox. Computer Use is full input-and-screen control. Run it inside a Docker container or VM, not on the host. The SDK’s documentation has reference Dockerfiles.
  • Ignoring prompt caching. Multi-turn agents with stable system prompts leak money without cache configuration. Mark cache_control breakpoints on the system prompt and tool definitions for any conversation that exceeds two turns.
  • Treating MCP as exotic. MCP is the Anthropic-recommended path for external tools. Custom Python tools work but are the path with more friction. Reach for an MCP server first and write a custom tool only when no MCP server fits.
  • Hand-rolling sessions. The SDK’s Session abstraction is the supported way to retain history. Stitching together raw Messages API calls reintroduces the loop the SDK was supposed to remove.
  • Using the SDK as the only path to Anthropic. For simple single-turn requests with no tools, the raw Messages API is fine. The SDK earns its weight when the loop, the tools, MCP, sessions, or Computer Use are involved.
  • Skipping streaming. The query function streams messages by default; consuming the async iterator gives the user incremental output. Buffering until completion hurts UX without saving cost.

How to trace the Claude Agent SDK with FutureAGI

The Claude Agent SDK emits standard Anthropic Messages API telemetry plus its own loop traces. To ship to FutureAGI’s observability platform or any other OTel backend, install one of the OTel-instrumentation packages and register it. With traceAI:

pip install traceai-claude-agent-sdk
from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from traceai_claude_agent_sdk import ClaudeAgentInstrumentor

trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="claude-agent",
)
ClaudeAgentInstrumentor().instrument(tracer_provider=trace_provider)

# Your Claude Agent SDK code now emits OTel-native trace trees.

The resulting trace tree shows the agent loop at the root, every Messages API call as a child span, every tool call with arguments and return value, and every MCP server invocation as a deeper span.

How FutureAGI implements Claude Agent SDK observability and evaluation

FutureAGI is the production-grade observability and evaluation platform for the Claude Agent SDK built around the closed reliability loop that other Claude Agent stacks stitch together by hand. The full stack runs on one Apache 2.0 self-hostable plane:

  • Claude Agent SDK tracing, traceAI (Apache 2.0) auto-wraps the agent loop, Messages API calls, tool execution, subagent dispatch, MCP server invocations, and conversation turns across Python, TypeScript, Java, and C#.
  • Agent evals, 50+ first-party metrics (Tool Correctness, Argument Correctness, Task Completion, Plan Adherence, Faithfulness, Hallucination, Conversation Relevancy) attach as span attributes; BYOK lets any LLM serve as the judge at zero platform fee, and turing_flash runs the same rubrics at 50 to 70 ms p95.
  • Simulation, persona-driven text and voice scenarios exercise Claude agents in pre-prod with the same scorer contract that judges production traces.
  • Gateway and guardrails, the Agent Command Center fronts 100+ providers with BYOK routing, and 18+ runtime guardrails (PII, prompt injection, jailbreak, tool-call enforcement) enforce policy on the same plane.

Beyond the four axes, FutureAGI also ships six prompt-optimization algorithms that consume failing trajectories as training data. Pricing starts free with a 50 GB tracing tier; Boost is $250 per month, Scale is $750 per month with HIPAA, and Enterprise from $2,000 per month with SOC 2 Type II.

Most teams running the Claude Agent SDK in production end up running three or four tools alongside it: one for traces, one for evals, one for the gateway, one for guardrails. FutureAGI is the recommended pick because tracing, evals, simulation, gateway, and guardrails all live on one self-hostable runtime; the loop closes without stitching. For more on the tracing model, read What is LLM Tracing?.

Sources

Related: What is the OpenAI Agents SDK?, What is CrewAI?, What is LangGraph?, What is LLM Tracing?

Frequently asked questions

What is the Claude Agent SDK in plain terms?
The Claude Agent SDK is Anthropic's programmable harness for building agent applications powered by Claude. The SDK runs the Claude Code CLI as a child process and exposes the agent-loop primitives most teams need: built-in Claude Code tools (including Read, Write, Edit, Bash, Glob, Grep, WebSearch, and WebFetch), MCP server integration, custom Python tools, persistent sessions, and a runner that executes the loop until the agent produces a final response. (Computer Use, the desktop-control capability, is a separate Messages API beta tool rather than a built-in Claude Code tool; Agent SDK apps integrate it only through custom tooling or MCP, and the developer supplies the sandbox and execution loop.) The SDK is the production-friendly way to build Claude agents without rolling your own loop.
Who maintains the Claude Agent SDK and what license is it under?
The Claude Agent SDK is maintained by Anthropic. The Python codebase at github.com/anthropics/claude-agent-sdk-python is MIT-licensed; SDK use is governed by Anthropic's Commercial Terms except where a component has its own license. The Python repo has approximately 6,400 GitHub stars as of April 2026. A separate TypeScript SDK at github.com/anthropics/claude-agent-sdk-typescript ships a similar surface, but its repo is published under Anthropic Commercial Terms rather than MIT. The SDK grew out of the Claude Code agent harness and is Anthropic's recommended path for Claude-first agent applications.
How is the Claude Agent SDK different from Claude Code?
Claude Code is Anthropic's terminal CLI agent product for software engineering tasks. The Claude Agent SDK exposes the same Claude Code agent loop, built-in tools, and harness as a Python/TypeScript library so you can build custom applications on top of it. Internally, the SDK runs the Claude Code CLI as a child process and gives your application a programmable interface to that loop. You can use the SDK to build entirely different agents (research assistants, document processors, customer support flows) without reimplementing the loop, the tool dispatch, or the prompt-caching wiring.
How is the Claude Agent SDK different from the OpenAI Agents SDK?
Both are programmable agent frameworks with tools, sessions, and tracing; the OpenAI Agents SDK is OpenAI-led but provider-agnostic and supports multi-agent workflows out of the box, while the Claude Agent SDK runs the Claude Code CLI as a child process and is Claude-first. The Claude Agent SDK leans into Anthropic's strengths: the Claude Code-derived built-in tools (including Read, Write, Edit, Bash, Glob, Grep, WebSearch, and WebFetch, with additional documented tools such as Monitor and AskUserQuestion), deep MCP server integration, subagent dispatch, and prompt caching on the Messages API. The OpenAI Agents SDK leans into OpenAI's strengths: agent loops with handoffs, guardrails, the Responses API, hosted tools (FileSearchTool, WebSearchTool, CodeInterpreterTool, ImageGenerationTool), runtime tools like ComputerTool, and built-in human-in-the-loop primitives. Pick by which provider you primarily target. (Anthropic's Computer Use is a separate beta capability on the API; integrate it through the SDK's tool or MCP paths if you need it.)
What is MCP and how does the Claude Agent SDK use it?
MCP is the Model Context Protocol, an open protocol for connecting LLMs to external tools and data sources. The Claude Agent SDK has first-class support for MCP servers: you point the SDK at an MCP server (local or remote), and the server's tools are auto-discovered and exposed to the agent. This is the recommended way to expose external systems (databases, file systems, ticket trackers, internal APIs) to Claude agents without writing custom Python tool wrappers.
Does the Claude Agent SDK support Computer Use?
Computer Use is a separate Messages API beta tool, not a built-in Agent SDK tool. Anthropic still classifies it as beta and requires the appropriate beta header on supported Claude 4.x models (check the current docs for the exact eligible model list). Agent SDK applications integrate Computer Use through custom tooling or MCP: you supply the sandbox (typically a Docker container or a VM with X11) and the screenshot/action loop, while the SDK orchestrates the surrounding tools, sessions, and tracing. Computer Use is the primitive behind agents that automate browser tasks, desktop apps, or any GUI workflow.
How do you trace a Claude Agent SDK run?
The SDK ships beta OTel telemetry that you opt into through configuration; once enabled, the runtime emits spans for the agent loop, assistant turns, tool execution, subagents, and MCP servers. OpenInference and traceAI also ship dedicated instrumentation packages (openinference-instrumentation-claude-agent-sdk and traceai-claude-agent-sdk) for additional convention coverage. The trace tree shows the agent loop at the root, every Messages API call as a child span, every tool call with arguments, and any MCP server invocation as a deeper span. Specific eval scoring at the span level requires a backend like FutureAGI that supports span-attached evaluation.
When should I not use the Claude Agent SDK?
Skip it when your stack is OpenAI-first and switching providers for the agent layer is friction; the OpenAI Agents SDK is the better fit. Skip it for multi-agent role-based workflows; CrewAI is more concise. Skip it for explicit-graph workflows with persistence and human-in-the-loop; LangGraph is more mature there. The Claude Agent SDK earns its weight when Claude is your main model, when MCP integration matters, when Computer Use is on the roadmap, or when prompt caching cost savings are important.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.