Guides

LangChain Callbacks in 2026: Events, Handlers, and Tracing Workflows

LangChain callbacks in 2026: every lifecycle event, sync vs async handlers, runnable config patterns, and how to wire callbacks into OpenTelemetry traces.

March 7, 2025

Updated May 14, 2026

7 min read

agents evaluations llms rag 2026

Table of Contents

LangChain callbacks in 2026: handlers, events, tracing

LangChain callbacks are the lifecycle hooks every chain, LLM, tool, retriever, and agent already emits. You attach a handler, LangChain fires typed events with run_id and parent_run_id at each step, and you turn those events into logs, traces, streaming tokens, cost metrics, or evaluations. The system has not changed shape in 2026, but it has expanded: chat-model events, retriever events, agent action events, and async-first dispatch are now first-class, and OpenTelemetry-native instrumentors built on the same hooks let you ship LangChain traces into any backend.

TL;DR

Question	Short answer
What are callbacks?	Observer-only handlers that LangChain invokes at chain, LLM, tool, retriever, and agent lifecycle points.
How do I attach one?	Pass `callbacks=[handler]` at construction or, preferred, via `RunnableConfig` at `invoke` time.
Which class do I subclass?	`BaseCallbackHandler` (sync) or `AsyncCallbackHandler` (async). Both live in `langchain_core.callbacks`.
What is the modern way to ship traces?	An OpenTelemetry instrumentor on top of the callback API: TraceAI’s `LangChainInstrumentor` (Apache 2.0) or OpenInference’s instrumentor.
When should I write a custom handler?	When you need bespoke logging, in-process metrics, or a thin bridge to a vendor SDK. For OTel/spans, use an instrumentor instead.
Where does LangSmith fit?	LangSmith’s tracer is itself a callback handler. You can stack it with custom or OTel handlers.

What is a LangChain callback?

A callback in LangChain is a Python object that implements one or more lifecycle methods such as on_chain_start, on_llm_end, on_tool_error, on_retriever_end, or on_agent_action. When you register the handler on a Runnable, LangChain calls the matching method at the correct moment with a structured payload: the input or output, a unique run_id, the parent_run_id so nested steps form a tree, and any tags or metadata you attached.

Callbacks are intentionally observer-only. They do not gate execution and they do not replace the value flowing through the chain. They exist so monitoring, streaming, and tracing can live outside the business logic. The two base classes you actually subclass are BaseCallbackHandler (sync) and AsyncCallbackHandler (async), both exported from langchain_core.callbacks. The legacy langchain.callbacks import path still works for backwards compatibility, but new code should target langchain_core because that is where Runnable, LCEL, and the v0.3 surface live.

How LangChain callbacks work end to end

Internally, every Runnable.invoke opens a CallbackManager, fans the start event out to all registered handlers, runs the underlying step, then fans the matching end (or error) event out. Nested Runnables get a child CallbackManagerForChainRun keyed by parent_run_id, which is what lets a tracer reconstruct a tree from a stream of flat events.

The event surface you should know in 2026:

Chain events: on_chain_start, on_chain_end, on_chain_error
LLM events: on_llm_start, on_llm_new_token, on_llm_end, on_llm_error
Chat-model events: on_chat_model_start (and shared on_llm_* for end/error)
Tool events: on_tool_start, on_tool_end, on_tool_error
Retriever events: on_retriever_start, on_retriever_end, on_retriever_error
Agent events: on_agent_action, on_agent_finish
Generic: on_text, on_custom_event (for user-emitted events)

Callback events generally include a run_id (UUID), an optional parent_run_id (None for the outermost run), and any tags and metadata you attached through RunnableConfig. That contract is what lets you build distributed tracing on top of callbacks without any other LangChain plumbing.

Attaching a callback handler

There are two registration patterns. Use the one that matches your deployment shape.

Attach at construction (component-scoped)

from langchain_openai import ChatOpenAI
from langchain_core.callbacks import StdOutCallbackHandler

handler = StdOutCallbackHandler()
llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])

response = llm.invoke("Explain MCP in one sentence.")

This attaches the handler for the lifetime of the llm object. It is fine for scripts and notebooks but is global state inside a web server.

Attach via RunnableConfig (request-scoped, recommended)

from langchain_openai import ChatOpenAI
from langchain_core.callbacks import StdOutCallbackHandler

llm = ChatOpenAI(model="gpt-4o")
handler = StdOutCallbackHandler()

response = llm.invoke(
    "Explain MCP in one sentence.",
    config={"callbacks": [handler], "tags": ["docs-demo"], "metadata": {"user_id": "u_42"}},
)

This is the idiomatic pattern in 2026: handlers, tags, and metadata travel with the request, so every span and log line is correctly scoped to one user, one trace, one tenant.

Built-in callback handlers in LangChain Core

LangChain ships a small set of handlers in langchain_core.callbacks that cover the common cases:

StdOutCallbackHandler writes events to stdout. Useful in notebooks, tests, and CI.
StreamingStdOutCallbackHandler streams tokens to stdout as they arrive (on_llm_new_token).
FileCallbackHandler mirrors stdout events to a file.
BaseTracer is the abstract class the LangSmith tracer extends. You almost never subclass it directly.

For production traces, prefer an OpenTelemetry instrumentor over hand-rolled FileCallbackHandler pipelines.

Writing a custom callback handler

A custom handler is a Python class that subclasses BaseCallbackHandler (sync) or AsyncCallbackHandler (async) and overrides only the methods it cares about. LangChain provides default no-op implementations for everything else.

import logging
from typing import Any
from langchain_core.callbacks import BaseCallbackHandler

logger = logging.getLogger("langchain.callbacks.cost")

class CostLoggingHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized: dict, prompts: list, **kwargs: Any) -> None:
        try:
            logger.info("llm_start prompts=%d run_id=%s", len(prompts), kwargs.get("run_id"))
        except Exception:
            logger.exception("cost handler on_llm_start failed")

    def on_llm_end(self, response: Any, **kwargs: Any) -> None:
        try:
            usage = getattr(response, "llm_output", {}).get("token_usage", {})
            logger.info(
                "llm_end input=%s output=%s total=%s run_id=%s",
                usage.get("prompt_tokens"),
                usage.get("completion_tokens"),
                usage.get("total_tokens"),
                kwargs.get("run_id"),
            )
        except Exception:
            logger.exception("cost handler on_llm_end failed")

Three things to notice:

The handler defensively swallows its own exceptions. A misbehaving handler should never break the chain.
It only overrides the events it needs. Token-usage logging does not require on_tool_start or on_retriever_end.
It reads run_id from **kwargs. That is the join key for any external system.

Wire it into a chain via RunnableConfig:

chain = prompt | llm | parser
result = chain.invoke({"question": q}, config={"callbacks": [CostLoggingHandler()]})

Async callbacks for streaming and serving

If you serve LangChain behind FastAPI, LangServe, or any async runtime, register AsyncCallbackHandler subclasses so the event loop is never blocked.

import asyncio
from langchain_core.callbacks import AsyncCallbackHandler
from langchain_openai import ChatOpenAI

class TokenForwarder(AsyncCallbackHandler):
    def __init__(self, queue: asyncio.Queue) -> None:
        self.queue = queue

    async def on_llm_new_token(self, token: str, **kwargs) -> None:
        await self.queue.put(token)

    async def on_llm_end(self, response, **kwargs) -> None:
        await self.queue.put(None)

This is the pattern that backs Server-Sent Events and WebSocket endpoints: every token chunk lands on an asyncio.Queue, and a consumer drains it into the wire format.

Callbacks for agents, tools, and retrievers

Agent and RAG pipelines are where callbacks pay for themselves. The same handler that logs LLM token usage also tells you which tool an agent picked, which documents a retriever fetched, and where a ReAct loop went off the rails.

from typing import Any
from langchain_core.callbacks import BaseCallbackHandler

class AgentAuditHandler(BaseCallbackHandler):
    def on_tool_start(self, serialized: dict, input_str: str, **kwargs: Any) -> None:
        print(f"tool_start name={serialized.get('name')} input={input_str[:120]}")

    def on_tool_end(self, output: str, **kwargs: Any) -> None:
        print(f"tool_end output={str(output)[:120]}")

    def on_retriever_end(self, documents: list, **kwargs: Any) -> None:
        print(f"retriever_end docs={len(documents)}")

    def on_agent_action(self, action, **kwargs: Any) -> None:
        print(f"agent_action tool={action.tool} reasoning={action.log[:200]}")

    def on_agent_finish(self, finish, **kwargs: Any) -> None:
        print(f"agent_finish output={str(finish.return_values)[:200]}")

These are the hooks that feed groundedness, tool-correctness, and trajectory evaluations downstream. For RAG, the on_retriever_end event is the load-bearing one: it is the only place where you cleanly see the documents the LLM saw before generating an answer.

From callbacks to OpenTelemetry traces

Hand-rolled handlers are fine for ad-hoc logging. For real distributed tracing in 2026, install an OpenTelemetry instrumentor and let it subscribe to the callback bus. The two open-source choices that ship today:

traceai-langchain (Apache 2.0, github.com/future-agi/traceAI) registers a LangChainInstrumentor against the global tracer provider and emits spans for chain, LLM, retriever, tool, and agent events.
openinference-instrumentation-langchain (Arize OpenInference, Apache 2.0, github.com/Arize-ai/openinference) uses the same OpenInference semantic conventions and is the upstream basis for Phoenix and several commercial backends.

Minimal traceAI setup:

from fi_instrumentation import register
from traceai_langchain import LangChainInstrumentor

trace_provider = register(project_name="langchain-app")
LangChainInstrumentor().instrument(tracer_provider=trace_provider)

That call wires LangChain callbacks to OpenTelemetry without you writing a single handler method. Every on_chain_start becomes a span, every on_llm_end records token usage as a span attribute, every on_retriever_end attaches retrieved documents to the span. Your backend (Future AGI, Phoenix, Jaeger, Datadog, Grafana Tempo) does the rest.

How LangSmith uses callbacks

LangSmith does not bypass the callback system. The LangChainTracer that ships with LangSmith is a subclass of BaseTracer (which is itself a BaseCallbackHandler), and it auto-registers when LANGCHAIN_TRACING_V2=true is set in the environment. That means you can run LangSmith alongside a custom handler and an OTel instrumentor at the same time. All three receive the same event stream because LangChain dispatches every callback to every registered handler.

For teams comparing LangChain-native tracing options to vendor stacks, see Future AGI vs LangSmith and the broader best AI agent debugging tools 2026 review.

Best practices for LangChain callbacks in 2026

Prefer RunnableConfig. Attach handlers per request, not per component. Global state is the single biggest source of cross-tenant trace bleed.
Subclass AsyncCallbackHandler in serving paths. Sync handlers in an async route end up on the default thread pool, which adds tens of milliseconds per token.
Defensive exception handling. Every handler method should swallow and log its own errors. A broken handler should never take a chain down.
Use tags and metadata. They are the cheapest way to filter traces by tenant, route, experiment, or model version downstream.
Pick instrumentors over custom tracers. A single LangChainInstrumentor().instrument() line replaces hundreds of lines of bespoke span code and stays in sync with semantic conventions.
Mind streaming. Override async def on_llm_new_token(...) only on routes that actually stream. The hook fires for every token; expensive work there blocks the response.
Pair callbacks with evaluation. Tracing tells you what happened; an evaluation harness on the same run_id tells you whether it was correct. The ai-evaluation library (Apache 2.0) consumes the same spans for groundedness, faithfulness, and tool-correctness scoring.

Where Future AGI fits

Future AGI’s observability stack consumes the LangChain callback bus through the open-source traceai-langchain instrumentor and stores spans in a managed backend. Once spans are flowing, the same workspace runs fi.evals.evaluate("faithfulness", output=..., context=...) or fi.evals.Evaluator over the traces to score retrievals, tool outputs, and final answers, and the Agent Command Center at /platform/monitor/command-center exposes the routing and guardrail surface for production agents. There is nothing LangChain-specific you need to do beyond installing the instrumentor; the callback events you already emit are the input.

Summary

LangChain callbacks in 2026 are still the right primitive for monitoring, streaming, and tracing inside an LCEL Runnable. You attach a BaseCallbackHandler or AsyncCallbackHandler through RunnableConfig, you implement only the events you care about, and you let an OpenTelemetry instrumentor turn the same events into distributed traces. The combination of LangChain’s lifecycle events and an Apache 2.0 instrumentor like traceai-langchain is what production teams ship today.

Frequently asked questions

What is a LangChain callback?

A LangChain callback is a handler object that LangChain invokes at well-defined points in the lifecycle of a chain, LLM call, tool call, retriever call, or agent step. Handlers receive a typed event payload (inputs, outputs, run_id, tags) so you can log, stream, measure latency, or forward spans into an external observability backend without changing the chain code itself.

What is the difference between BaseCallbackHandler and AsyncCallbackHandler?

BaseCallbackHandler defines synchronous methods such as on_chain_start and on_llm_end and runs inline with the chain. AsyncCallbackHandler exposes the same method names as coroutines, so you override them with `async def on_chain_start(...)` and `async def on_llm_end(...)`. Use the async variant whenever the chain is invoked with ainvoke, astream, or astream_events so callbacks do not block the event loop. Mixing both is allowed: LangChain dispatches to the right method based on the call site.

How do I pass callbacks at runtime instead of at construction time?

LangChain Runnables accept callbacks through the RunnableConfig: chain.invoke(input, config={'callbacks': [handler]}). This is the recommended pattern for request-scoped handlers because it avoids global state and works inside FastAPI, Celery, or Lambda handlers. You can also attach tags and metadata in the same config to filter traces later.

Can callbacks modify chain outputs or short-circuit execution?

No. Callbacks are observer-only by design and run after the underlying step has already executed. To intercept inputs or outputs you need RunnableLambda wrappers, output parsers, or guardrails. Use callbacks for logging, tracing, streaming, and metrics; use Runnable composition for transformations.

How do LangChain callbacks integrate with OpenTelemetry?

Modern observability stacks treat each callback event as a span. The LangChain OTel ecosystem includes OpenInference LangChainInstrumentor (Arize) and TraceAI's traceai-langchain LangChainInstrumentor (Apache 2.0, GitHub: future-agi/traceAI). Both subscribe to LangChain callbacks and emit OpenTelemetry spans with attributes like llm.input_tokens, llm.output_tokens, retrieval.documents, and tool.name, which downstream backends visualize.

Do I need callbacks if I already use LangSmith?

LangSmith is built on the callback system. The LangChainTracer that ships with LangSmith is itself a BaseTracer subclass and receives the same on_chain_start and on_llm_end events. You can run LangSmith side by side with a custom handler or an OpenTelemetry instrumentor because LangChain dispatches every event to every registered handler.

How do I prevent callback exceptions from breaking my chain?

Wrap the body of each handler method in a try/except block and log errors instead of re-raising. LangChain will not crash a chain because a handler raised, but unhandled exceptions show up in logs and degrade trace quality. For production, ship handlers that defensively handle missing fields, set timeouts on any IO, and offload heavy work to a background queue.

Which events fire for streaming responses?

Token streaming triggers on_llm_new_token for every chunk (either the synchronous BaseCallbackHandler form or an `async def on_llm_new_token` override on an AsyncCallbackHandler), followed by on_llm_end with the aggregated LLMResult. on_chain_start and on_chain_end still fire once per chain invocation. For Server-Sent Events or WebSocket endpoints, register an AsyncCallbackHandler so the chunk hook does not block the asyncio loop.

View all

Guides

Build a Generative AI Chatbot in 2026: Step-by-Step Guide

Build a generative AI chatbot in 2026: model selection, RAG, prompt-opt, evaluation, observability, guardrails, gateway. Step-by-step with current tooling.