LangChain Callbacks in 2026: Events, Handlers, and Tracing Workflows
LangChain callbacks in 2026: every lifecycle event, sync vs async handlers, runnable config patterns, and how to wire callbacks into OpenTelemetry traces.
Table of Contents
LangChain callbacks in 2026: handlers, events, tracing
LangChain callbacks are the lifecycle hooks every chain, LLM, tool, retriever, and agent already emits. You attach a handler, LangChain fires typed events with run_id and parent_run_id at each step, and you turn those events into logs, traces, streaming tokens, cost metrics, or evaluations. The system has not changed shape in 2026, but it has expanded: chat-model events, retriever events, agent action events, and async-first dispatch are now first-class, and OpenTelemetry-native instrumentors built on the same hooks let you ship LangChain traces into any backend.
TL;DR
| Question | Short answer |
|---|---|
| What are callbacks? | Observer-only handlers that LangChain invokes at chain, LLM, tool, retriever, and agent lifecycle points. |
| How do I attach one? | Pass callbacks=[handler] at construction or, preferred, via RunnableConfig at invoke time. |
| Which class do I subclass? | BaseCallbackHandler (sync) or AsyncCallbackHandler (async). Both live in langchain_core.callbacks. |
| What is the modern way to ship traces? | An OpenTelemetry instrumentor on top of the callback API: TraceAI’s LangChainInstrumentor (Apache 2.0) or OpenInference’s instrumentor. |
| When should I write a custom handler? | When you need bespoke logging, in-process metrics, or a thin bridge to a vendor SDK. For OTel/spans, use an instrumentor instead. |
| Where does LangSmith fit? | LangSmith’s tracer is itself a callback handler. You can stack it with custom or OTel handlers. |
What is a LangChain callback?
A callback in LangChain is a Python object that implements one or more lifecycle methods such as on_chain_start, on_llm_end, on_tool_error, on_retriever_end, or on_agent_action. When you register the handler on a Runnable, LangChain calls the matching method at the correct moment with a structured payload: the input or output, a unique run_id, the parent_run_id so nested steps form a tree, and any tags or metadata you attached.
Callbacks are intentionally observer-only. They do not gate execution and they do not replace the value flowing through the chain. They exist so monitoring, streaming, and tracing can live outside the business logic. The two base classes you actually subclass are BaseCallbackHandler (sync) and AsyncCallbackHandler (async), both exported from langchain_core.callbacks. The legacy langchain.callbacks import path still works for backwards compatibility, but new code should target langchain_core because that is where Runnable, LCEL, and the v0.3 surface live.
How LangChain callbacks work end to end
Internally, every Runnable.invoke opens a CallbackManager, fans the start event out to all registered handlers, runs the underlying step, then fans the matching end (or error) event out. Nested Runnables get a child CallbackManagerForChainRun keyed by parent_run_id, which is what lets a tracer reconstruct a tree from a stream of flat events.
The event surface you should know in 2026:
- Chain events:
on_chain_start,on_chain_end,on_chain_error - LLM events:
on_llm_start,on_llm_new_token,on_llm_end,on_llm_error - Chat-model events:
on_chat_model_start(and sharedon_llm_*for end/error) - Tool events:
on_tool_start,on_tool_end,on_tool_error - Retriever events:
on_retriever_start,on_retriever_end,on_retriever_error - Agent events:
on_agent_action,on_agent_finish - Generic:
on_text,on_custom_event(for user-emitted events)
Callback events generally include a run_id (UUID), an optional parent_run_id (None for the outermost run), and any tags and metadata you attached through RunnableConfig. That contract is what lets you build distributed tracing on top of callbacks without any other LangChain plumbing.
Attaching a callback handler
There are two registration patterns. Use the one that matches your deployment shape.
Attach at construction (component-scoped)
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import StdOutCallbackHandler
handler = StdOutCallbackHandler()
llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])
response = llm.invoke("Explain MCP in one sentence.")
This attaches the handler for the lifetime of the llm object. It is fine for scripts and notebooks but is global state inside a web server.
Attach via RunnableConfig (request-scoped, recommended)
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import StdOutCallbackHandler
llm = ChatOpenAI(model="gpt-4o")
handler = StdOutCallbackHandler()
response = llm.invoke(
"Explain MCP in one sentence.",
config={"callbacks": [handler], "tags": ["docs-demo"], "metadata": {"user_id": "u_42"}},
)
This is the idiomatic pattern in 2026: handlers, tags, and metadata travel with the request, so every span and log line is correctly scoped to one user, one trace, one tenant.
Built-in callback handlers in LangChain Core
LangChain ships a small set of handlers in langchain_core.callbacks that cover the common cases:
StdOutCallbackHandlerwrites events to stdout. Useful in notebooks, tests, and CI.StreamingStdOutCallbackHandlerstreams tokens to stdout as they arrive (on_llm_new_token).FileCallbackHandlermirrors stdout events to a file.BaseTraceris the abstract class the LangSmith tracer extends. You almost never subclass it directly.
For production traces, prefer an OpenTelemetry instrumentor over hand-rolled FileCallbackHandler pipelines.
Writing a custom callback handler
A custom handler is a Python class that subclasses BaseCallbackHandler (sync) or AsyncCallbackHandler (async) and overrides only the methods it cares about. LangChain provides default no-op implementations for everything else.
import logging
from typing import Any
from langchain_core.callbacks import BaseCallbackHandler
logger = logging.getLogger("langchain.callbacks.cost")
class CostLoggingHandler(BaseCallbackHandler):
def on_llm_start(self, serialized: dict, prompts: list, **kwargs: Any) -> None:
try:
logger.info("llm_start prompts=%d run_id=%s", len(prompts), kwargs.get("run_id"))
except Exception:
logger.exception("cost handler on_llm_start failed")
def on_llm_end(self, response: Any, **kwargs: Any) -> None:
try:
usage = getattr(response, "llm_output", {}).get("token_usage", {})
logger.info(
"llm_end input=%s output=%s total=%s run_id=%s",
usage.get("prompt_tokens"),
usage.get("completion_tokens"),
usage.get("total_tokens"),
kwargs.get("run_id"),
)
except Exception:
logger.exception("cost handler on_llm_end failed")
Three things to notice:
- The handler defensively swallows its own exceptions. A misbehaving handler should never break the chain.
- It only overrides the events it needs. Token-usage logging does not require
on_tool_startoron_retriever_end. - It reads
run_idfrom**kwargs. That is the join key for any external system.
Wire it into a chain via RunnableConfig:
chain = prompt | llm | parser
result = chain.invoke({"question": q}, config={"callbacks": [CostLoggingHandler()]})
Async callbacks for streaming and serving
If you serve LangChain behind FastAPI, LangServe, or any async runtime, register AsyncCallbackHandler subclasses so the event loop is never blocked.
import asyncio
from langchain_core.callbacks import AsyncCallbackHandler
from langchain_openai import ChatOpenAI
class TokenForwarder(AsyncCallbackHandler):
def __init__(self, queue: asyncio.Queue) -> None:
self.queue = queue
async def on_llm_new_token(self, token: str, **kwargs) -> None:
await self.queue.put(token)
async def on_llm_end(self, response, **kwargs) -> None:
await self.queue.put(None)
This is the pattern that backs Server-Sent Events and WebSocket endpoints: every token chunk lands on an asyncio.Queue, and a consumer drains it into the wire format.
Callbacks for agents, tools, and retrievers
Agent and RAG pipelines are where callbacks pay for themselves. The same handler that logs LLM token usage also tells you which tool an agent picked, which documents a retriever fetched, and where a ReAct loop went off the rails.
from typing import Any
from langchain_core.callbacks import BaseCallbackHandler
class AgentAuditHandler(BaseCallbackHandler):
def on_tool_start(self, serialized: dict, input_str: str, **kwargs: Any) -> None:
print(f"tool_start name={serialized.get('name')} input={input_str[:120]}")
def on_tool_end(self, output: str, **kwargs: Any) -> None:
print(f"tool_end output={str(output)[:120]}")
def on_retriever_end(self, documents: list, **kwargs: Any) -> None:
print(f"retriever_end docs={len(documents)}")
def on_agent_action(self, action, **kwargs: Any) -> None:
print(f"agent_action tool={action.tool} reasoning={action.log[:200]}")
def on_agent_finish(self, finish, **kwargs: Any) -> None:
print(f"agent_finish output={str(finish.return_values)[:200]}")
These are the hooks that feed groundedness, tool-correctness, and trajectory evaluations downstream. For RAG, the on_retriever_end event is the load-bearing one: it is the only place where you cleanly see the documents the LLM saw before generating an answer.
From callbacks to OpenTelemetry traces
Hand-rolled handlers are fine for ad-hoc logging. For real distributed tracing in 2026, install an OpenTelemetry instrumentor and let it subscribe to the callback bus. The two open-source choices that ship today:
traceai-langchain(Apache 2.0, github.com/future-agi/traceAI) registers aLangChainInstrumentoragainst the global tracer provider and emits spans for chain, LLM, retriever, tool, and agent events.openinference-instrumentation-langchain(Arize OpenInference, Apache 2.0, github.com/Arize-ai/openinference) uses the same OpenInference semantic conventions and is the upstream basis for Phoenix and several commercial backends.
Minimal traceAI setup:
from fi_instrumentation import register
from traceai_langchain import LangChainInstrumentor
trace_provider = register(project_name="langchain-app")
LangChainInstrumentor().instrument(tracer_provider=trace_provider)
That call wires LangChain callbacks to OpenTelemetry without you writing a single handler method. Every on_chain_start becomes a span, every on_llm_end records token usage as a span attribute, every on_retriever_end attaches retrieved documents to the span. Your backend (Future AGI, Phoenix, Jaeger, Datadog, Grafana Tempo) does the rest.
How LangSmith uses callbacks
LangSmith does not bypass the callback system. The LangChainTracer that ships with LangSmith is a subclass of BaseTracer (which is itself a BaseCallbackHandler), and it auto-registers when LANGCHAIN_TRACING_V2=true is set in the environment. That means you can run LangSmith alongside a custom handler and an OTel instrumentor at the same time. All three receive the same event stream because LangChain dispatches every callback to every registered handler.
For teams comparing LangChain-native tracing options to vendor stacks, see Future AGI vs LangSmith and the broader best AI agent debugging tools 2026 review.
Best practices for LangChain callbacks in 2026
- Prefer RunnableConfig. Attach handlers per request, not per component. Global state is the single biggest source of cross-tenant trace bleed.
- Subclass AsyncCallbackHandler in serving paths. Sync handlers in an async route end up on the default thread pool, which adds tens of milliseconds per token.
- Defensive exception handling. Every handler method should swallow and log its own errors. A broken handler should never take a chain down.
- Use tags and metadata. They are the cheapest way to filter traces by tenant, route, experiment, or model version downstream.
- Pick instrumentors over custom tracers. A single
LangChainInstrumentor().instrument()line replaces hundreds of lines of bespoke span code and stays in sync with semantic conventions. - Mind streaming. Override
async def on_llm_new_token(...)only on routes that actually stream. The hook fires for every token; expensive work there blocks the response. - Pair callbacks with evaluation. Tracing tells you what happened; an evaluation harness on the same
run_idtells you whether it was correct. The ai-evaluation library (Apache 2.0) consumes the same spans for groundedness, faithfulness, and tool-correctness scoring.
Where Future AGI fits
Future AGI’s observability stack consumes the LangChain callback bus through the open-source traceai-langchain instrumentor and stores spans in a managed backend. Once spans are flowing, the same workspace runs fi.evals.evaluate("faithfulness", output=..., context=...) or fi.evals.Evaluator over the traces to score retrievals, tool outputs, and final answers, and the Agent Command Center at /platform/monitor/command-center exposes the routing and guardrail surface for production agents. There is nothing LangChain-specific you need to do beyond installing the instrumentor; the callback events you already emit are the input.
Summary
LangChain callbacks in 2026 are still the right primitive for monitoring, streaming, and tracing inside an LCEL Runnable. You attach a BaseCallbackHandler or AsyncCallbackHandler through RunnableConfig, you implement only the events you care about, and you let an OpenTelemetry instrumentor turn the same events into distributed traces. The combination of LangChain’s lifecycle events and an Apache 2.0 instrumentor like traceai-langchain is what production teams ship today.
Frequently asked questions
What is a LangChain callback?
What is the difference between BaseCallbackHandler and AsyncCallbackHandler?
How do I pass callbacks at runtime instead of at construction time?
Can callbacks modify chain outputs or short-circuit execution?
How do LangChain callbacks integrate with OpenTelemetry?
Do I need callbacks if I already use LangSmith?
How do I prevent callback exceptions from breaking my chain?
Which events fire for streaming responses?
Build a generative AI chatbot in 2026: model selection, RAG, prompt-opt, evaluation, observability, guardrails, gateway. Step-by-step with current tooling.
RAG vs fine-tuning in 2026: decision matrix on data freshness, cost, latency, accuracy, governance, and how to evaluate either path with Future AGI.
LLM evaluation in 2026: deterministic metrics, LLM-as-judge, RAG metrics, agent metrics, and how to wire offline regression plus runtime guardrails.