Research

What is AutoGen? Microsoft's Multi-Agent Framework in 2026

AutoGen is Microsoft's open-source framework for conversational multi-agent applications. Agents, GroupChat, AgentChat, AutoGen Studio, and the v0.4 split.

May 3, 2025

8 min read

autogen microsoft multi-agent agent-framework groupchat python open-source 2026

Table of Contents

A team is building an internal market-research assistant. The natural shape is a debate: a bull-case agent argues one side, a bear-case agent argues the other, a synthesis agent reads both and produces a balanced summary. There is no obvious task pipeline; the agents need to talk back and forth until the synthesis agent calls termination. CrewAI’s role-and-task framing fits poorly. LangGraph would work but the graph is mostly one big conversational node. AutoGen’s GroupChat is the natural fit: three agents in a SelectorGroupChat with a TextMentionTermination on “FINAL ANSWER”.

This is the shape AutoGen is built for. Where CrewAI is opinionated about roles and tasks and LangGraph is opinionated about explicit graphs, AutoGen is opinionated about conversations between agents. The framework is in maintenance mode in 2026 (Microsoft moved active development to the Microsoft Agent Framework), so this guide is most useful for teams already on AutoGen or evaluating it for a conversation-shaped workflow they intend to keep small. This guide covers what AutoGen is, its layered architecture, how its primitives work, how it compares to alternatives, and when to pick it.

TL;DR: What AutoGen is

AutoGen is an open-source Python framework originating at Microsoft Research for building multi-agent applications as conversations. The codebase at github.com/microsoft/autogen is MIT-licensed; the docs are CC-BY 4.0. The repo has approximately 50,000 GitHub stars as of mid-2026. Microsoft moved AutoGen into maintenance mode in early 2026 and points new users to the Microsoft Agent Framework for greenfield builds; the AutoGen project continues with bug fixes and community contributions. The architecture is the layered post-v0.4 design (currently published as autogen-core, autogen-agentchat, and autogen-ext 0.7.x packages on PyPI). AutoGen Studio is a separate no-code UI for prototyping. The framework is most useful when the workflow naturally decomposes into a conversation between specialized agents and the team is committed to staying on AutoGen for the lifetime of the project.

Why AutoGen still matters in 2026

Three things kept AutoGen on the procurement radar even after Microsoft moved active development to the Microsoft Agent Framework.

First, the conversational-agent abstraction earned its keep. Open-ended dialog is a real production workflow shape (debate, brainstorming, multi-perspective review, structured red-teaming). Frameworks built around tasks-and-roles (CrewAI) or explicit state graphs (LangGraph) produce awkward code for those shapes. AutoGen’s GroupChat is the natural fit.

Second, the v0.4 rewrite cleaned up the architecture. The v0.2 codebase had grown organically and accumulated cross-cutting concerns. v0.4 layered the design, made the runtime asynchronous and observable, and added native OpenTelemetry tracing at the runtime level. The rewrite stabilized the API surface that current 0.7.x releases continue to ship.

Third, existing AutoGen production stacks have not gone anywhere. Teams with running AutoGen v0.4 / 0.7.x deployments keep operating them. New Microsoft/Azure builds should evaluate the Microsoft Agent Framework first; teams already on AutoGen still get a maintained release stream.

The anatomy of an AutoGen application (post-v0.4 layered architecture)

The post-v0.4 layered architecture maps to three packages (latest on PyPI in 2026 is the 0.7.x line).

autogen-core. The runtime layer. An actor model where agents are addressable entities that send and receive typed messages. The runtime handles message routing, agent lifecycle, and OpenTelemetry span emission. Most users do not touch this layer directly but it is what makes the higher layers observable.

autogen-agentchat. The high-level dialog API. AssistantAgent for LLM-backed agents, UserProxyAgent for human-in-the-loop, RoundRobinGroupChat for fixed-order conversation, SelectorGroupChat for model-driven turn-taking, MagenticOneGroupChat for the Magentic-One orchestrator. This is where most application code lives.

autogen-ext. The integrations layer. Model clients (OpenAIChatCompletionClient, AzureOpenAIChatCompletionClient, AnthropicChatCompletionClient, OllamaChatCompletionClient), code executors (DockerCommandLineCodeExecutor, JupyterCodeExecutor, LocalCommandLineCodeExecutor), and tool wrappers.

The conversation flow: a Team is constructed with agents and a termination condition. The Team’s run method kicks off the conversation. Agents take turns under the coordination strategy. Each turn, the active agent receives the conversation history, calls its model, optionally calls tools, and produces a message. The termination condition is checked after each turn. When it fires, the Team returns a TaskResult with the final messages.

AutoGen in 30 lines

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o")
    try:
        bull = AssistantAgent(
            name="bull_case",
            model_client=model_client,
            system_message="Argue the bull case for the topic.",
        )
        bear = AssistantAgent(
            name="bear_case",
            model_client=model_client,
            system_message="Argue the bear case for the topic.",
        )
        synthesis = AssistantAgent(
            name="synthesis",
            model_client=model_client,
            system_message="Read both sides. End with 'FINAL ANSWER:' and a synthesis.",
        )

        team = RoundRobinGroupChat(
            participants=[bull, bear, synthesis],
            termination_condition=TextMentionTermination("FINAL ANSWER"),
        )

        result = await team.run(task="Investing in vector database startups in 2026.")
        print(result.messages[-1].content)
    finally:
        # Async model clients hold connections; always close in production code.
        await model_client.close()

asyncio.run(main())

The team runs until the synthesis agent emits “FINAL ANSWER”, then returns the conversation history.

How AutoGen compares to alternatives

Framework	Primitive	Best for	Maintainer
AutoGen (legacy / maintenance)	Conversational agent in a Team	Existing AutoGen stacks; conversation-shaped workflows	Microsoft (MIT code, in maintenance mode in 2026)
Microsoft Agent Framework	Microsoft-backed successor agent runtime	New Microsoft/Azure builds	Microsoft (MIT)
CrewAI	Role + task + crew	Role-decomposable pipelines	CrewAI Inc. (MIT)
LangGraph	Stateful graph	Arbitrary state machines, persistence	LangChain Inc. (MIT)
OpenAI Agents SDK	Agent loop with tools, handoffs, guardrails, HITL	Single- or multi-agent workflows on OpenAI	OpenAI (MIT)

The conceptual choice is workflow shape. If your workflow is a conversation and you are already on AutoGen, AutoGen still works. For new conversation-shaped builds inside Microsoft/Azure ecosystems, the Microsoft Agent Framework is the actively developed successor. If your workflow decomposes into roles and tasks, CrewAI. If it is a state machine, LangGraph. If it is a single agent with tools, the OpenAI Agents SDK or Claude Agent SDK.

Production patterns with AutoGen

Three patterns recur.

Pattern 1: RoundRobin team with explicit termination. Three to five agents take turns in fixed order. A termination condition (TextMentionTermination, MaxMessageTermination, or a combined And/Or condition) ends the run. This is the simplest GroupChat pattern and the right shape for debate, review, and brainstorming workflows.

Pattern 2: Selector team with model-driven coordination. A SelectorGroupChat uses a model to choose which agent speaks next. The selector reads the conversation and dispatches to the most relevant agent. Useful when the natural flow is non-deterministic (a customer support workflow that routes between billing, technical, and account-recovery specialists).

Pattern 3: Magentic-One orchestrator for open-ended tasks. MagenticOneGroupChat is a higher-level orchestrator built on the Magentic-One research from Microsoft. The orchestrator agent plans, delegates to specialist agents (web surfer, file surfer, coder), tracks progress, and replans when stuck. This is the AutoGen primitive for open-ended task completion that does not fit a fixed agent topology.

Common mistakes when adopting AutoGen

Mixing v0.2 and v0.4 imports. The two APIs are not compatible. Pick one and stay consistent. New projects should target v0.4.
Skipping the termination condition. Without one, a group chat can keep running indefinitely. Always pass a termination_condition (or max_turns on the team) that maps to your workflow’s natural stop signal.
Using the LocalCommandLineCodeExecutor in production. It runs code on the host. Use the DockerCommandLineCodeExecutor for sandboxed execution.
Treating AutoGen Studio as a runtime. It is a prototyping UI. Production should run the autogen-agentchat Python API directly.
Forgetting to close the model client. The async model clients hold connections; close them in a finally block or use them inside an async context manager.
Building pipeline workflows in AutoGen. A clean four-step research pipeline is more naturally expressed in CrewAI. AutoGen’s strength is conversation, not sequence.
Assuming the runtime traces every model call. AutoGen’s runtime emits spans for runtime, agents, and tools; model-call spans require provider instrumentation (opentelemetry-instrumentation-openai, traceAI provider wrappers, etc.). Configure both layers when you want a complete trace tree.

How to trace AutoGen with FutureAGI

AutoGen’s post-v0.4 runtime emits OpenTelemetry spans for runtime, agent, and tool events through autogen-core. Model-call spans come from a provider instrumentation. To ship runtime + provider traces to FutureAGI’s observability platform or any other OTel backend, layer traceAI’s AutoGen package on top of a provider instrumentation:

pip install traceai-autogen

from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from traceai_autogen import AutogenInstrumentor

trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="market-research-team",
)
AutogenInstrumentor().instrument(tracer_provider=trace_provider)

# Add a provider instrumentation for model-call spans, e.g.:
# from traceai_openai import OpenAIInstrumentor
# OpenAIInstrumentor().instrument(tracer_provider=trace_provider)

The resulting trace tree shows the team run at the root, every agent message exchange as a child span, every tool call with arguments and return value, the termination event, and model calls as deeper child spans (provided by the provider instrumentation).

How FutureAGI implements AutoGen observability and evaluation

FutureAGI is the production-grade observability and evaluation platform for AutoGen built around the closed reliability loop that other AutoGen stacks stitch together by hand. The full stack runs on one Apache 2.0 self-hostable plane:

AutoGen tracing, traceAI (Apache 2.0) auto-wraps autogen-core runtime, agent message exchanges, GroupChat patterns, tool calls, and termination events; provider instrumentations (OpenAI, Anthropic, Bedrock) layer on for model-call spans across Python, TypeScript, Java, and C#.
Conversation evals, 50+ first-party metrics (Tool Correctness, Conversation Relevancy, Role Adherence, Task Completion, Plan Adherence, Faithfulness) attach as span attributes on every agent message; BYOK lets any LLM serve as the judge at zero platform fee, and turing_flash runs the same rubrics at 50 to 70 ms p95.
Simulation, persona-driven text and voice scenarios exercise teams in pre-prod with the same scorer contract that judges production traces.
Gateway and guardrails, the Agent Command Center fronts 100+ providers with BYOK routing; 18+ runtime guardrails (PII, prompt injection, jailbreak, tool-call enforcement) enforce policy on the same plane.

Beyond the four axes, FutureAGI also ships six prompt-optimization algorithms that consume failing trajectories as training data. Pricing starts free with a 50 GB tracing tier; Boost is $250 per month, Scale is $750 per month with HIPAA, and Enterprise from $2,000 per month with SOC 2 Type II.

Most teams running AutoGen in production end up running three or four tools alongside it: one for traces, one for evals, one for the gateway, one for guardrails. FutureAGI is the recommended pick because tracing, evals, simulation, gateway, and guardrails all live on one self-hostable runtime; the loop closes without stitching. For more on the tracing model, read What is LLM Tracing?.

Sources

Series cross-link

Frequently asked questions

What is AutoGen in plain terms?

AutoGen is an open-source Python framework from Microsoft for building multi-agent applications around the conversational-agent abstraction. Each agent is an entity that sends and receives messages; multiple agents talk to each other in a GroupChat or a layered Team to solve a task. The framework handles the message routing, the model calls, the tool execution, and the termination logic. AutoGen leans toward open-ended dialog patterns rather than role-and-task pipelines.

Who maintains AutoGen and what license is it under?

AutoGen originated at Microsoft Research and is community-managed in 2026. The repo at github.com/microsoft/autogen is MIT-licensed for source code; documentation is CC-BY 4.0. Microsoft moved AutoGen into maintenance mode in early 2026: bug fixes and community PRs continue, but new feature work is happening in the Microsoft Agent Framework, which Microsoft positions as the successor for new Azure-shop builds. The repo has approximately 50,000 GitHub stars as of mid-2026. The codebase shipped a major architectural rewrite as v0.4 in early 2025 that split the framework into autogen-core, autogen-agentchat, and autogen-ext layered packages, and the current PyPI releases are on the 0.7.x line.

What changed between AutoGen v0.2 and v0.4?

v0.4 was a near-complete rewrite. The v0.2 surface was a single autogen package with ConversableAgent, GroupChat, and assistant patterns at the top level. v0.4 split it into three layers: autogen-core for the underlying actor model and message routing, autogen-agentchat for the high-level multi-agent dialog API, and autogen-ext for the provider integrations (model clients, code executors, tool wrappers). v0.2 remains available on the 0.2 branch, but AutoGen overall is in maintenance mode in 2026: new feature work has moved to the Microsoft Agent Framework, and the AutoGen project continues with bug fixes and community PRs.

How is AutoGen different from CrewAI?

AutoGen's primitive is the conversational agent. Agents send messages, other agents reply, and the conversation runs until a termination condition fires. CrewAI's primitive is the role-and-task assignment. The framework dispatches tasks to role-defined agents under a sequential or hierarchical process. AutoGen produces more natural code for dialog-shaped workflows (debate, brainstorming, multi-perspective review). CrewAI produces more natural code for pipeline-shaped workflows (research then write then edit).

What is AutoGen Studio?

AutoGen Studio is a no-code, browser-based UI for designing, running, and inspecting AutoGen agents and teams without writing Python. It is a separate package (autogenstudio) maintained by the AutoGen team. You define agents and teams in the UI, attach tools, run conversations, and inspect message logs. It is positioned as an experimentation surface, not a production runtime, but it is useful for prototyping before dropping into the autogen-agentchat Python API.

Does AutoGen support tools and code execution?

Yes. autogen-ext ships tool wrappers for arbitrary Python functions, MCP-compatible tools, LangChain tools, and a code-execution module. The code executor runs generated Python in a Docker container, a Jupyter kernel, or a local subprocess. The pattern is for one agent to write code, another to execute it, and the conversation continues with the execution output. The Docker executor is the default for production because of the sandbox isolation.

How do you trace an AutoGen run?

AutoGen's post-v0.4 runtime emits OpenTelemetry spans through autogen-core for runtime, agent-message, and tool-execution events. Model-call spans are not produced by autogen-core itself; you add them by installing a provider instrumentation (for example `opentelemetry-instrumentation-openai` or traceAI's per-provider wrappers). traceAI ships a dedicated `traceai-autogen` package for Microsoft AutoGen v0.4+. OpenInference's AG2/autogen instrumentation is listed as experimental; verify compatibility against the AutoGen version you run before relying on it. The full trace tree shows the team run at the root, every message exchange, every model call (via the provider instrumentation), and the termination signal.

When should I not use AutoGen?

Skip AutoGen for new greenfield builds inside the Microsoft/Azure ecosystem; the Microsoft Agent Framework is the actively developed successor. Skip it when your workflow is a clean linear pipeline; CrewAI is more concise. Skip it for single-agent loops with tools; the OpenAI Agents SDK or Claude Agent SDK is simpler. Skip it when you need persistent state checkpointing across long-running workflows; LangGraph's checkpointer is the better fit. AutoGen still earns its weight on existing AutoGen stacks and conversation-shaped workflows where teams want to stay on the framework.

View all

Research

What is CrewAI? Multi-Agent Framework Explained in 2026

CrewAI is a Python framework for role-based multi-agent orchestration. Crews, agents, tasks, flows, tools, and how it differs from LangGraph and AutoGen.

Vrinda Damani · Nov 28, 2025

9 min

Research

What is Pydantic AI? Type-Safe Agent Framework in 2026

Pydantic AI is a Python agent framework that brings Pydantic-style validation to LLM tool calls and outputs. Agents, tools, dependency injection, graphs.

Vrinda Damani · Apr 30, 2026

8 min

Research

What is LangGraph? Stateful Agent Graphs Explained in 2026

LangGraph is LangChain's graph-based orchestration library for stateful agents. Nodes, edges, state, checkpointers, and how it differs from CrewAI.

Rishav Hada · Oct 28, 2025

8 min