How is A2A different from MCP?

MCP (Model Context Protocol) connects an agent to tools and data sources. A2A connects agents to each other. MCP is the agent-to-tool protocol; A2A is the agent-to-agent protocol. They are complementary, not alternatives.

How do you observe A2A interactions?

FutureAGI's `a2a` traceAI integration captures every A2A message exchange as an OpenTelemetry span tagged with sender, receiver, task ID, and step index. Evaluators run on cross-agent trajectories the same way they run on single-agent ones.

Agent-to-Agent Protocol (A2A): FutureAGI Guide

Q: What is the Agent-to-Agent Protocol (A2A)?

A2A is an open specification for cross-vendor AI agent communication. It standardizes agent cards, capability negotiation, message envelopes, and long-running task semantics.

What Is the Agent-to-Agent Protocol (A2A)?

The Agent-to-Agent Protocol (A2A) is an open specification for cross-vendor AI agent communication. It defines how agents discover each other, publish agent cards, negotiate capabilities, exchange messages, and manage long-running tasks across different runtimes. In production, A2A shows up at agent handoff boundaries where one agent delegates work to another. FutureAGI treats those exchanges as traceable, evaluable agent trajectories, so teams can measure whether the right peer was called and whether the downstream task completed.

Why the Agent-to-Agent Protocol (A2A) matters in production agent systems

The first wave of agent frameworks — LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Strands — solved the single-runtime multi-agent problem. They each ship rich primitives for coordinating agents inside one vendor stack. The cross-vendor problem is different: how do you let a Salesforce sales agent invoke a Google Workspace research agent that runs in a different cloud, owned by a different team? Without a protocol, the integration is bespoke per pair, and the surface area scales N². With A2A it scales N + 1.

Different roles see different stakes. A platform engineer building an enterprise agent stack does not want to write five vendor-specific adapters for the same handoff pattern. An SRE wants one observability schema for all cross-agent calls, not one per vendor. A compliance officer wants one auth/audit story for any cross-tenant agent invocation. A product partner wants the agent ecosystem to grow — which only happens when the integration cost is low.

In 2026, A2A is still nascent. The agent-card discovery model is settling, but the long-running-task semantics, auth flows, and streaming-message conventions are still being hardened. Teams shipping now should treat A2A integrations like any other distributed protocol: instrument every call, evaluate every cross-agent trajectory, and assume the receiver may be running a different framework with different status semantics.

How FutureAGI Handles A2A

FutureAGI’s approach is to treat A2A messages as first-class spans. The a2a traceAI integration captures every cross-agent call as an OpenTelemetry span tagged with sender agent ID, receiver agent ID, the task ID, the step index (agent.trajectory.step), and the message envelope. That makes it possible to query “show me every A2A call from our Salesforce agent to the Google Workspace agent that failed in the last hour” without parsing logs.

Evaluation runs across the cross-agent trajectory. ToolSelectionAccuracy scores whether the sender picked the right peer to call (an A2A invocation is functionally a tool call). TaskCompletion scores the receiver’s sub-trajectory after the call. TrajectoryScore aggregates the joint trajectory, so a poor receiver-side execution does not get hidden by a clean sender-side log.

Concrete example: an enterprise crew runs a Salesforce sales agent that calls a Google Workspace research agent over A2A. After deployment, FutureAGI shows TaskCompletion at 73% — but slicing by sender shows the Salesforce side is at 91% while the post-A2A sub-trajectory is at 64%. The cross-agent span view reveals the A2A message envelope omits a customer_segment field the research agent expects, so the research agent searches the wrong index. Adding that field to the agent card and re-issuing the call drops failures to 9%. The fix took an hour because the protocol surface was instrumented and evaluable.

For comparison: MCP (the Model Context Protocol) is the closest sibling. MCP connects agents to tools and data sources; A2A connects agents to each other. FutureAGI ships traceAI integrations for both — mcp and a2a — and the same evaluators apply to both surfaces.

How to measure A2A interactions

A2A interactions need both per-call and population dashboards:

ToolSelectionAccuracy: scores whether the sender chose the right peer agent to invoke; A2A calls are tool calls in disguise.
TaskCompletion: scores the post-A2A sub-trajectory — flags receiver-side failures separately.
TrajectoryScore: aggregates the joint sender + receiver trajectory.
a2a-failure rate (dashboard signal): % of cross-agent calls where the receiver fails TaskCompletion within N steps.
a2a-payload-size p99 (dashboard signal): payload bloat is a common A2A regression; instrument the size.
agent.trajectory.step (OTel attribute): paired with sender and receiver IDs lets the dashboard slice by route.

from fi.evals import ToolSelectionAccuracy, TaskCompletion

choice = ToolSelectionAccuracy().evaluate(
    input=sender_state,
    trajectory=sender_spans,
)
post = TaskCompletion().evaluate(
    input=original_goal,
    trajectory=receiver_spans,
)
print(choice.score, post.score)

Common mistakes

Treating A2A like a private API. Public agent cards are protocol contracts; version them, document them, and deploy them with the agent runtime.
No shared trace context across agents. Propagate trace IDs across the A2A boundary, or sender-side and receiver-side failures become separate incidents.
Evaluating only the sender. A clean sender log with a wrong receiver result is still a failed trajectory; score both sides together.
Using A2A where MCP is the right answer. A2A is agent-to-agent; MCP is agent-to-tool. Mixing them creates brittle capability maps.
Skipping auth and audit. Cross-tenant A2A calls cross security boundaries; require structured auth, scoped delegation, and audit logs for every invocation.