Research

What is Google ADK? The Agent Development Kit Explained for 2026

Google ADK is an open-source Python, TypeScript, Go, and Java framework for building, evaluating, and deploying agents on Vertex AI Agent Engine. What it is, primitives, and 2026 release status.

·
12 min read
google-adk agent-development-kit vertex-ai gemini ai-agents agent-framework agent-engine 2026
Editorial cover image on a pure black starfield background with faint white grid. Bold all-caps white headline WHAT IS GOOGLE ADK fills the left half. The right half shows a wireframe vertical Google-style stack of four labeled slabs (VERTEX AI INFRA, GEMINI MODEL, TOOLS LAYER, ADK AGENT LAYER) drawn in pure white outlines. The top ADK AGENT LAYER carries a soft white halo glow.
Table of Contents

Picture an engineering team building an internal customer-support agent. The agent reads a ticket, decides whether to look up the order, the shipping record, or the refund policy, calls the right tool, drafts a reply, and asks a sub-agent to grade the draft for tone before sending. The team writes the agent as a compact Python module, runs it against a small eval set on every commit, and pushes it to Vertex AI Agent Engine when the eval pass-rate clears the threshold. The framework that holds all of that together, from the loop to the sub-agents to the evalsets to the deploy command, is the Agent Development Kit. Google publicly states ADK is the framework powering its own Agentspace and Customer Engagement Suite agents; the same framework is open source under Apache 2.0.

This piece walks through what ADK is, the primitives, the agent loop, how it composes multi-agent systems, the eval framework, the managed runtime, and how it compares with the other major agent SDKs in 2026.

TL;DR: What Google ADK is

Google ADK (Agent Development Kit) is an open-source agent framework released by Google in April 2025. It ships in Python, TypeScript, Go, and Java under Apache 2.0 and is actively maintained. The framework is model-agnostic but optimized for Gemini, supports OpenAI, Anthropic, and other providers via LiteLLM, and is the SDK Google uses internally for products like Agentspace, the Vertex AI agent gallery, and the Customer Engagement Suite. ADK gives you an agent loop with typed tools, multi-agent composition (sequential, parallel, loop, hierarchical), a built-in evaluation framework with evalsets and LLM-as-judge scoring, a local dev UI, and a one-command path to deploy to Vertex AI Agent Engine, Cloud Run, or GKE. Like the OpenAI Agents SDK and AutoGen, ADK v1.x models agents as classes with workflow-agent composition rather than as graphs; the in-flight ADK 2.0 line is adding explicit graph-based workflows.

Why Google built ADK

Three things pushed Google to ship a first-party SDK rather than rely on third-party frameworks like LangChain or CrewAI.

First, the production agents Google runs internally have specific shape. They live behind Vertex AI, they need session continuity, they need memory across turns, they need observability that flows into Google Cloud Trace, and they need to scale to many concurrent users. A framework optimized for tinkering on a laptop is not the framework that runs Agentspace.

Second, the agent space had no first-party Google primitive. Anthropic shipped the Claude Agent SDK. OpenAI shipped the OpenAI Agents SDK. Microsoft shipped the Microsoft Agent Framework. For Gemini-first builders, the choices were LangChain, CrewAI, AutoGen, or roll-your-own. ADK closes the gap.

Third, ADK is the framework Google’s own teams already used. Releasing it externally was less an act of standing up a new project than an act of opening a project that already shipped Google products. The repo notes that ADK powers the agents inside Agentspace and the Customer Engagement Suite. The internal-and-external duality is why the SDK is opinionated: it solves problems Google teams hit, not generic problems a community wishlist might prioritize.

The ADK stack

ADK is one piece of a larger Google agent stack. Understanding where ADK sits clarifies what it does and does not own.

Editorial figure on a black starfield background showing the Google ADK stack as four vertical wireframe slabs from bottom to top: VERTEX AI INFRA, GEMINI FOUNDATION MODEL, TOOLS AND MEMORY LAYER, ADK AGENT LAYER. The top ADK slab carries a soft white halo glow. Tiny upward arrows connect each slab to the next.

  • Vertex AI infra is the cloud substrate: GPUs, model serving, storage, networking. ADK does not manage this layer; it consumes it.
  • Gemini foundation model is the default model. ADK can also call OpenAI, Anthropic, Cohere, Mistral, Bedrock, and any LiteLLM-compatible provider.
  • Tools and memory is the layer where ADK gives you typed functions in the supported SDK language (Python, TypeScript, Go, or Java) plus a session state primitive. ADK ships built-in tools (Google Search, Code Exec) and lets you compose your own.
  • ADK agent layer is the SDK itself: the agent loop, the sub-agent composition primitives, the evaluation framework, and the deployment command.

Core primitives

The framework reduces to a small set of primitives.

Agent

An agent is a class with a model, an instruction, optional tools, and optional sub-agents. The minimum example from the docs:

from google.adk.agents import Agent

root_agent = Agent(
    name="weather_agent",
    model="gemini-2.5-flash",
    instruction="You answer weather questions for cities.",
    tools=[get_weather],
)

The instruction is the system prompt. The tools are Python functions; ADK introspects type hints and docstrings to build the function-calling schema. A Runner invokes the agent, manages the conversation, and dispatches tool calls.

Tool

A tool is a typed function or method in the supported SDK language (Python examples below; TypeScript, Go, and Java follow the same shape). ADK supports four flavors:

  • FunctionTool. A regular function. ADK builds the schema from type hints and the docstring.
  • LongRunningFunctionTool. A wrapper for functions that start external long-running work, return an initial status or operation id, pause the run, and let the calling app resume with intermediate or final responses; useful for human-in-the-loop and background jobs.
  • AgentTool. Wraps another agent so a parent can call a child as if it were a tool.
  • Built-in tools. google_search, code_execution, vertex_ai_search, ready to use.

Sub-agents and workflow agents

Composition is a first-class concern. ADK defines three workflow agents that compose other agents:

  • SequentialAgent. Runs a list of agents in order; each receives the previous output.
  • ParallelAgent. Runs agents concurrently and collects all outputs.
  • LoopAgent. Repeats an agent until a stop condition holds, useful for refinement loops.

A multi-agent setup composes these. A planner agent dispatches to a sub-agent for retrieval, then to a sub-agent for drafting, then to a sub-agent for grading, all wired through SequentialAgent. The pattern is similar to AutoGen’s GroupChat or CrewAI’s crew composition; ADK’s twist is that workflow agents are themselves agents, so they nest.

For teams already on LangGraph, ADK exposes a LangGraph integration path so existing graphs can be wrapped and called from inside an ADK agent.

Session and memory

Sessions hold conversation state. Memory holds long-term state across sessions. ADK provides in-memory implementations for development and Vertex AI-backed implementations for production. State is keyed by user id and session id; reading and writing state happens through tools or via callbacks on the agent loop.

Callbacks

Callbacks fire at agent, model, and tool boundaries: before/after agent, before/after model call, and before/after tool. Use cases include input validation, output redaction, custom logging, and dynamic instruction modification. The OpenAI Agents SDK calls these “lifecycle hooks”; the shape is similar.

The agent loop

The loop is unsurprising if you have read any agent SDK.

  1. Runner.run(user_message) is called.
  2. ADK formats the conversation history plus instruction plus tool schemas and calls the model.
  3. The model returns either a final answer or one or more tool calls.
  4. ADK executes the tool calls (Python functions, sub-agents, or built-in tools), captures the results.
  5. The results append to the conversation; back to step 2.
  6. The loop ends when the model returns a final answer, when a callback signals stop, or when the step budget is exceeded.

What ADK adds on top of the bare loop:

  • Token streaming through the runner’s run_async interface.
  • Trace emission to OpenTelemetry exporters; spans carry gen_ai.* attributes that any OTel backend can ingest.
  • Step budgets that cap runaway tool loops.
  • Callback injection at the three lifecycle points above.

The evaluation framework

A first-class eval story is one of ADK’s bigger differentiators against minimalist agent SDKs.

An evalset is a JSON file containing test trajectories: input messages, expected tool calls, expected final responses. ADK runs the agent against each trajectory and scores on three axes:

  • Tool trajectory match. Did the agent call the expected tools with the expected arguments?
  • Response match. Does the final response match the reference, scored either by exact match, ROUGE, or LLM-as-judge?
  • Custom metrics. Plug in your own scorer.

The CLI is adk eval <agent_module> <evalset.json>. The output is a per-trajectory pass-or-fail with diagnostics. Pre-commit hooks or CI gates can block merges on regression.

For richer evaluation, the spans ADK emits are OTel-compatible, so any LLM observability platform that consumes OTel works. Future AGI’s evaluation library, for example, runs turing_flash for guardrail screening at 50-70 ms p95 and full eval templates at about 1-2 seconds per trace, both consuming the same span schema. The eval framework inside ADK and span-attached scoring outside ADK compose naturally.

Deployment paths

ADK’s deployment story is opinionated toward Google Cloud but not exclusive.

  • Vertex AI Agent Engine. The managed runtime. The ADK deploy command packages the agent, deploys it as a managed service, and exposes a REST endpoint with sessions, memory, autoscaling, and built-in tracing. This is the path Agentspace and the Customer Engagement Suite use.
  • Cloud Run. adk deploy cloud_run packages the agent as a container and deploys to Cloud Run, optionally with the --with_ui flag to include the dev UI. Less managed than Agent Engine but a fit for low-traffic services.
  • GKE. Helm chart and manifests for Kubernetes.
  • Self-host. It is just Python, TypeScript, Go, or Java; run it inside any FastAPI service, Express app, Go service, or JVM service.

The deploy command is convenience, not lock-in. The agent itself is portable.

Dev UI

ADK ships a local web UI invoked by adk web. It shows the agent’s conversation, tool calls, model parameters, and a span-style trace tree of each turn. For a developer iterating on instructions or tool definitions, the dev UI is a tight feedback loop.

The dev UI is primarily intended for local development. Cloud Run deployments can opt in to the UI via the --with_ui flag; Agent Engine deployments do not include the web UI in the served payload. Production observability lives in Google Cloud Trace, in the Vertex AI console, or in any OTel backend you point ADK at via the standard OTLP exporter.

How ADK compares with other agent SDKs

A practical map:

  • OpenAI Agents SDK. Closest in shape. Agent-as-class, handoffs, tool wrapping. Different model bias (GPT vs Gemini), different default deployment surface (none vs Vertex AI), similar primitives.
  • Claude Agent SDK. Lower-level. Anthropic’s SDK gives you a tool loop and lets you build the framework on top. ADK is one level up, with workflow agents and an eval framework built in.
  • Microsoft Agent Framework. Closer parity. Both ship from a hyperscaler, both target managed runtimes (Azure AI Foundry vs Vertex AI Agent Engine), both ship eval frameworks. Choose by cloud preference.
  • LangGraph. State-machine first. ADK v1.x is agent-class first with workflow composition; ADK 2.0 adds graph-based workflows. The two interoperate via LangGraphAgent.
  • CrewAI. Role-based. ADK is class-based. Different mental model for the same multi-agent problem.
  • AutoGen. Conversational multi-agent. Closer to ADK than CrewAI but with a chat-first abstraction. ADK is more workflow-first.

Production patterns with ADK

Three patterns that show up commonly in ADK deployments.

1. Plan-and-execute hierarchy

A SequentialAgent chains a planner (Gemini 2.5 Pro), a tool-calling worker (Gemini 2.5 Flash), and a grader (Gemini 2.5 Flash with an LLM-as-judge prompt). The planner emits a plan; the worker executes; the grader vetoes outputs that fail the rubric. The pattern survives in production because each stage has a clear role and a measurable failure mode.

2. Reflexion loop with LoopAgent

A LoopAgent wraps a draft agent and a critic agent. Each iteration: draft produces output, critic scores it, if score is below threshold the loop continues with the critic’s feedback appended. Stop condition is either threshold or max iterations. Used for code generation, long-form writing, and any workload where one-shot output quality is unreliable.

3. Long-running human-in-the-loop

A LongRunningFunctionTool initiates external long-running work and returns an initial status or operation id when the agent wants to take a destructive action (issue a refund, send an external email). The runner pauses the session, returns control to the calling app, and the app resumes the run with an intermediate or final function response once the human approves. The pattern lets ADK power workflows that span minutes or hours without holding a connection open. The same primitive is referenced as LongRunningFunctionTool in the API; older docs sometimes shorten it to LongRunningTool.

Common mistakes

  • Treating ADK as the LangChain replacement. It is not. ADK is a Google-stack-aware agent framework; LangChain is a model-and-tool integration kit. They solve different things.
  • Skipping the evalset. The eval framework is one of ADK’s better features. Skipping it means losing the easiest feedback loop.
  • Not pinning model versions. Gemini model ids and stable versions change over time. Pin a specific Google-listed stable version rather than the bare gemini-2.5-flash alias if you want determinism across deploys; consult Google’s model lifecycle page for the current versioned id.
  • Over-using ParallelAgent. Parallel runs save wall time but multiply token cost and make tracing harder. Use sequential by default.
  • Forgetting to instrument tools. A tool that calls an external API without a span is invisible in traces. Wrap external calls in OTel spans or rely on traceAI/OpenInference auto-instrumentation for common HTTP libraries.
  • Skipping callbacks. Input validation and PII redaction belong in the before_model callback, not scattered across tool implementations.
  • Hard-coding to Vertex AI Agent Engine. The deploy command is convenient but the agent should run anywhere. Test it in a generic Python (or TypeScript / Go / Java) environment so you preserve portability.

How FutureAGI implements Google ADK observability and evaluation

FutureAGI is the production-grade observability and evaluation platform for Google ADK built around the closed reliability loop that other ADK stacks stitch together by hand. The full stack runs on one Apache 2.0 self-hostable plane:

  • ADK tracing, traceAI (Apache 2.0) consumes ADK’s built-in OTel spans plus auto-instrumentation for sub-agent dispatch, tool execution, ParallelAgent fan-out, and callback hooks across Python, TypeScript, Java, and C#; spans land in ClickHouse-backed storage with per-step latency and full tool-call payloads.
  • Agent evals, 50+ first-party metrics (Tool Correctness, Argument Correctness, Plan Adherence, Task Completion, Faithfulness, Hallucination) attach as span attributes; BYOK lets any LLM serve as the judge at zero platform fee, and turing_flash runs the same rubrics at 50 to 70 ms p95.
  • Simulation, persona-driven text and voice scenarios exercise ADK agents in pre-prod with the same scorer contract that judges production traces, including evalset replay through the Agent Command Center.
  • Gateway and guardrails, the Agent Command Center fronts 100+ providers (Gemini, OpenAI, Anthropic, Bedrock) with BYOK routing; 18+ runtime guardrails compose with ADK’s before_model callbacks on the same plane.

Beyond the four axes, FutureAGI also ships six prompt-optimization algorithms that consume failing trajectories as training data. Pricing starts free with a 50 GB tracing tier; Boost is $250 per month, Scale is $750 per month with HIPAA, and Enterprise from $2,000 per month with SOC 2 Type II.

Most teams running Google ADK in production end up running three or four tools alongside it: one for traces, one for evals, one for the gateway, one for guardrails. FutureAGI is the recommended pick because tracing, evals, simulation, gateway, and guardrails all live on one self-hostable runtime; the loop closes without stitching. For workload-specific evaluation see the companion tutorial on evaluating Google ADK agents.

Sources

Related: Evaluating Google ADK Agents, What is the Claude Agent SDK?, What is the OpenAI Agents SDK?, What is LLM Tracing?

Frequently asked questions

What is Google ADK in plain terms?
Google Agent Development Kit (ADK) is an open-source framework Google released in April 2025 for building, evaluating, and deploying AI agents. It is model-agnostic but optimized for Gemini and Vertex AI, ships in Python, TypeScript, Go, and Java, and powers internal Google products like Agentspace and the Customer Engagement Suite. ADK gives you an agent loop, multi-agent hierarchies, a typed tool interface, an evaluation harness, and a one-command path to Vertex AI Agent Engine for managed deployment.
What is the relationship between Google ADK and Vertex AI Agent Engine?
ADK is the SDK; Vertex AI Agent Engine is the managed runtime. You build an agent locally with ADK, test it with the dev UI, run evals against a dataset, then deploy with the ADK deploy command or the Vertex AI SDK. Agent Engine handles autoscaling, sessions, memory, and observability; ADK is what produces the deployable artifact. They are designed together but ADK can also run on Cloud Run, GKE, or any Python, TypeScript, Go, or Java host.
Is Google ADK free and open source?
Yes. The Python, TypeScript, Go, and Java SDKs are Apache 2.0 licensed on GitHub under the google/adk-* repos. The framework is free; you pay for the underlying model calls (Gemini, OpenAI via LiteLLM, Anthropic, etc.) and for managed runtime if you use Vertex AI Agent Engine. Self-hosting on your own infrastructure has zero ADK license cost.
How does ADK compare with LangGraph, CrewAI, and AutoGen?
ADK v1.x is workflow-agent first rather than graph-first; the in-development ADK 2.0 line introduces graph-based workflows. It models agents as classes with a `run` loop, supports hierarchies via sub-agents and three workflow agents (SequentialAgent, ParallelAgent, LoopAgent), and integrates a built-in evaluation framework. CrewAI is role-based; LangGraph is state-machine based; ADK is agent-as-class with workflow agents as composition primitives, and offers a LangGraph integration path for teams already on LangGraph. The cleanest comparison: ADK = framework + dev UI + managed runtime backed by one cloud.
What models can ADK use?
Gemini natively (Gemini 2.5 Pro, Flash, and Flash-Lite are the supported tiers in 2026; check Google's model lifecycle page for current versioned ids). Other providers via the LiteLLM bridge: OpenAI, Anthropic, Cohere, Mistral, Together, Bedrock, and any model LiteLLM supports. Vertex Model Garden models also work directly. The model is configurable per-agent so you can use a frontier model for the planner and a cheap model for the worker.
What does the ADK agent loop look like?
An agent has a model, an instruction string, an optional list of tools, and an optional list of sub-agents. On `run`, ADK formats the conversation, calls the model, parses tool calls from the response, executes the tools, appends the results to the conversation, and loops until the model emits a final answer or hits a step budget. Tools are Python functions decorated with type hints and docstrings; ADK turns them into the function-calling schema the model expects.
How do you evaluate ADK agents?
ADK ships a built-in eval framework. You define evalsets (JSON files of test trajectories), run `adk eval` against them, and get pass-rates on tool-use accuracy, response quality, and trajectory match. The framework supports LLM-as-judge scoring out of the box. For production observability and richer eval, ADK emits OpenTelemetry traces that any OTel backend can ingest, including FutureAGI.
What is the latest ADK release?
ADK is actively maintained with frequent releases. Python is in the v1.x series with a 2.0 alpha line in flight; TypeScript, Java, and Go each release independently and continue to converge on parity. Recent additions include the LongRunningFunctionTool primitive for human-in-the-loop and async work, Google Cloud Trace integration via the agent runtime, and tighter Vertex AI Agent Engine deployment ergonomics. Verify exact versions on the google/adk-* GitHub repos before procurement.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.