Research

What is OpenInference? OpenTelemetry for LLM Apps in 2026

OpenInference is the OpenTelemetry-aligned semantic convention and instrumentation library for LLM applications, maintained by Arize. What it is and how it fits in 2026.

April 11, 2026

8 min read

openinference opentelemetry otel-genai llm-tracing arize instrumentation auto-instrumentation 2026

Table of Contents

A team adds tracing to their LangChain RAG pipeline. Naive logging would be: print the prompt, print the model name, print the response, print the latency. Useful at hello-world scale. Useless at production scale. The team wants the full span tree: which retriever query ran, which embedding model returned what, which LLM call made which tool call, what the eval score was, what the cost was. They could write all of this themselves and re-do it for every framework upgrade, or they could install one package, register one instrumentor, and have every LangChain call automatically emit OTel-aligned spans with the right attribute names. The package is OpenInference.

This piece walks through what OpenInference is, the semantic conventions, the instrumentation surface, how it relates to OTel GenAI, and how it compares with traceAI, OpenLLMetry, and OpenLIT in 2026.

TL;DR: What OpenInference is

OpenInference is an OpenTelemetry-aligned semantic convention and instrumentation framework for LLM applications, maintained by Arize. It defines a set of attribute names (the openinference.* and llm.* namespaces) for LLM spans and ships drop-in auto-instrumentation packages across many LLM providers, agent frameworks, and RAG libraries. The instrumentations are Apache 2.0 licensed at Arize-ai/openinference and emit OTLP spans that any OTel backend can consume. Phoenix is Arize’s reference OSS backend for OpenInference traces; FutureAGI, Datadog, Grafana, Jaeger, and any other OTel-compatible system also work. Coverage is broadest in Python, with JavaScript / TypeScript and Java (LangChain4j, Spring AI, plus core) packages also shipped. OpenInference predates the OTel GenAI semantic conventions and remains complementary; some instrumentations emit attributes from both namespaces, but dual emission is per-package so verify before relying on it.

Why OpenInference exists

Three forces converged.

First, LLM applications produce a different shape of trace than HTTP services. An HTTP span carries http.method, http.status_code, http.url. An LLM span needs the prompt, the completion, the model name, the token counts, the temperature, the tool definitions, the retriever query, the embedding vectors. There was no standard for those attributes when LLM apps started shipping in 2023.

Second, every framework had its own instrumentation pattern. LangChain shipped callbacks. LlamaIndex shipped event handlers. CrewAI shipped a step listener. The patterns differed; the attribute names differed; the spans were not interoperable. A trace from LangChain could not be merged with a trace from LlamaIndex without a lot of glue.

Third, OpenTelemetry’s GenAI semantic conventions did not exist yet. The OTel project started the gen_ai.* namespace in 2024 and shipped initial versions in 2025; OpenInference filled the gap from 2023 with a parallel namespace and instrumentations. By 2026 the two namespaces are converging, with OpenInference instrumentations emitting both sets of attributes for backward compatibility.

OpenInference helped standardize span attributes across LLM frameworks and OTel-compatible backends. The attribute namespace lets one tool query traces from many frameworks consistently; the instrumentations let teams adopt observability without writing per-framework boilerplate.

How OpenInference fits in

Three layers.

Frameworks and providers are what your application calls: OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, DSPy, etc.
OpenInference instrumentations monkey-patch the framework or provider client to emit spans. One pip install plus one Instrumentor().instrument() call.
OTLP transport carries the spans to any OTel backend.

The instrumentations are decoupled from the backend. The backend can be Phoenix, FutureAGI, Datadog, Grafana, Tempo, Jaeger, or anything that speaks OTLP.

The semantic conventions

OpenInference defines attribute names under the openinference.* namespace plus a set of legacy LLM-specific names. The most-used attributes:

Span kind

Every OpenInference span carries openinference.span.kind. Well-known values:

LLM: a call to a chat or completion endpoint
CHAIN: a multi-step orchestration unit (LangChain chain, LlamaIndex query engine)
RETRIEVER: a vector search or BM25 retriever
EMBEDDING: an embedding generation call
TOOL: a function or tool invocation
AGENT: an agent run (top of an agent loop)
RERANKER: a re-ranker over retrieved chunks
GUARDRAIL: a pre or post guardrail check
EVALUATOR: an online eval score

LLM-specific attributes

llm.model_name, llm.provider
llm.input_messages, llm.output_messages
llm.token_count.prompt, llm.token_count.completion, llm.token_count.total
llm.temperature, llm.top_p, llm.max_tokens
llm.tools, llm.invocation_parameters
llm.system_prompt

Retrieval-specific attributes

retrieval.documents: list of retrieved documents with content and score
retrieval.query: the search query
embedding.embeddings, embedding.model_name

Generic input / output

input.value, input.mime_type
output.value, output.mime_type

The conventions overlap with the OTel GenAI gen_ai.* namespace; recent OpenInference instrumentations emit both.

The instrumentation surface

The Python repo includes packages for:

LLM providers: OpenAI, Anthropic, Bedrock, Groq, MistralAI, VertexAI (Gemini), LiteLLM, OpenAI Agents
Agent frameworks: LangChain, LlamaIndex (and Workflows), CrewAI, Agno, AutoGen (incl. AgentChat), PydanticAI, Smolagents, BeeAI, Google ADK
RAG libraries: Haystack, LlamaIndex retrievers
DSL frameworks: DSPy, MCP

The JavaScript / TypeScript surface covers OpenAI, Anthropic, Vertex, Bedrock, LangChain JS, LlamaIndex JS, plus core. The Java surface covers LangChain4j and Spring AI.

Every package follows the same shape:

from openinference.instrumentation.openai import OpenAIInstrumentor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="https://your-backend/v1/traces")))
trace.set_tracer_provider(provider)

OpenAIInstrumentor().instrument()

# Now every openai client call emits a span

For multi-framework apps you call multiple instrumentors. They compose; LangChain inside OpenAI inside CrewAI all emit nested spans without conflict.

OpenInference vs OTel GenAI

The relationship is the most-asked question.

OpenInference (2023+) and OTel GenAI (2024+) are parallel attribute namespaces with significant overlap. Some OpenInference instrumentations emit or map to OTel GenAI attributes, so a span can carry both llm.model_name and gen_ai.request.model, llm.token_count.prompt and gen_ai.usage.input_tokens. Coverage is per-package; OTel GenAI itself is still under active development. Backends that query either namespace work where dual emission is enabled.

Long-term direction: OTel GenAI graduates from development status, the wider ecosystem standardizes on gen_ai.*, and OpenInference’s namespace becomes a superset that adds non-LLM-call attributes (retrieval, eval, agent kind) the OTel GenAI spec does not cover. OpenInference is not “deprecated by OTel GenAI” in any practical sense; it stays the broader instrumentation framework around the OTel GenAI core.

How OpenInference compares with other LLM instrumentation frameworks

A practical map.

traceAI (FutureAGI). Comparable in goals: OTel-native instrumentation, 35+ frameworks across Python, TypeScript, Java, and C#, including LangChain4j and Spring AI on the Java side. Apache 2.0 on GitHub.
OpenLLMetry (Traceloop). Pure OTel GenAI alignment. Python and TypeScript packages. Slightly different abstraction (auto vs decorator).
OpenLIT. Broad-vendor with multi-signal coverage (LLM + GPU + DB). Single SDK with broad provider support, OTel-aligned.
Lilypad (Mirascope). Prompt-versioning-as-traces angle; treats every prompt change as a traced version.
Vendor-specific SDKs. Langfuse SDK, LangSmith SDK, Helicone SDK, Datadog LLM SDK. Tied to one backend; emit spans the vendor consumes natively.

The instrumentations are interchangeable for many production setups. Many teams run OpenInference plus one or two vendor-specific SDKs in parallel during a migration. The choice often comes down to: does the framework you use have a mature instrumentation in the project, and does your observability backend prefer one namespace over another?

Production patterns

Three that show up.

1. Multi-framework, single observability backend

App uses LangChain for the agent loop, LlamaIndex for the retriever, OpenAI for the LLM, and a custom tool layer. Install OpenInference instrumentations for all four. Spans nest correctly in one trace tree. Backend (Phoenix, FutureAGI, Datadog, etc.) consumes one OTLP stream.

2. Migration to standardized namespaces

Team has older custom instrumentation with proprietary attribute names. Add OpenInference instrumentations alongside; spans now carry both old and new attribute names; queries shift gradually to the OpenInference namespace; old instrumentation removed.

3. Cross-language correlation

Frontend in JavaScript calls a backend in Python that calls Java microservices. OpenInference JS, Python, and Java instrumentations all emit spans with the same openinference.span.kind and aligned attribute names. The trace tree merges across languages without manual stitching.

Common mistakes

Installing the instrumentation but not registering an exporter. Spans are emitted but go nowhere. Always set up an OTLP exporter.
Multiple TracerProviders. Two libraries register two providers; one wins; the other’s spans are dropped. Configure the global provider once at app start.
Capturing prompt content in production without redaction. llm.input_messages and llm.output_messages carry PII. Add a span processor that redacts before export.
Sampling at 1% uniform head sampling. Loses the long-tail failures the trace was meant to catch. Use tail-based sampling with rules for errors, low eval scores, and high cost.
Mixing instrumentor versions. OpenInference packages version independently. Pin all of them and bump together.
Skipping the instrumentor on the client side of a multi-language app. A trace that starts in unsourced JS and continues in instrumented Python has a missing root span and confuses backends. Instrument the entry point.
Treating OpenInference as a backend. It is not. OpenInference is the convention and the instrumentation. The backend (Phoenix, FutureAGI, Datadog) is separate.
Ignoring the openinference.span.kind attribute when querying. Without it, the same backend sees retriever spans and LLM spans as undifferentiated. Filter by span kind.

How FutureAGI implements OpenInference observability and evaluation

FutureAGI is the production-grade backend for OpenInference-instrumented stacks built around the closed reliability loop that other OpenInference backends stitch together by hand. The full stack runs on one Apache 2.0 self-hostable plane:

OpenInference ingest, the FutureAGI collector ingests openinference.*, llm.*, and gen_ai.* namespaces natively over OTLP; traceAI (Apache 2.0) is the sibling instrumentation library that covers the broader Java surface (LangChain4j, Spring AI) and a C# core where OpenInference coverage is lighter, with auto-instrumentation for 35+ frameworks.
Span-attached evals, 50+ first-party metrics attach as span attributes filtered by openinference.span.kind; BYOK lets any LLM serve as the judge at zero platform fee, and turing_flash runs the same rubrics at 50 to 70 ms p95.
Simulation, persona-driven scenarios exercise OpenInference-instrumented frameworks in pre-prod with the same scorer contract that judges production traces.
Gateway and guardrails, the Agent Command Center fronts 100+ providers with BYOK routing, and 18+ runtime guardrails enforce policy on the same plane.

Beyond the four axes, FutureAGI also ships six prompt-optimization algorithms that consume failing trajectories as training data. Pricing starts free with a 50 GB tracing tier; Boost is $250 per month, Scale is $750 per month with HIPAA, and Enterprise from $2,000 per month with SOC 2 Type II.

Most teams standardizing on OpenInference end up running three or four backend products in production: one for traces, one for evals, one for the gateway, one for guardrails. FutureAGI is the recommended pick because OpenInference ingest, evals, simulation, gateway, and guardrails all live on one self-hostable runtime; the loop closes without stitching.

Sources

Series cross-link

Frequently asked questions

What is OpenInference in plain terms?

OpenInference is an OpenTelemetry-aligned semantic convention and a set of instrumentation libraries for LLM applications, maintained by Arize. It defines attribute names for LLM spans (the prompt, the completion, the model, the tool calls, the retriever results) and ships drop-in instrumentations across many LLM providers, agent frameworks, and RAG libraries. The instrumentations emit OTLP-compatible spans that any OTel backend (Phoenix, FutureAGI, Datadog, Grafana, Jaeger) can consume. The OpenInference repo at Arize-ai/openinference primarily covers Python, JavaScript/TypeScript, and Java; verify the live repo for the current package list.

How does OpenInference relate to OpenTelemetry GenAI semantic conventions?

OpenInference predates the OTel GenAI semantic conventions and remains complementary. The two attribute namespaces (openinference.* / llm.* and gen_ai.*) overlap; some OpenInference integrations and processors map to or emit OTel GenAI attributes, but not every instrumentation does, and OTel GenAI itself is still under active development. Verify per package whether dual emission is enabled. Most observability vendors accept both namespaces.

Is OpenInference free and open source?

Yes. The OpenInference repository is Apache 2.0 licensed on GitHub at Arize-ai/openinference. The instrumentation packages are free pip / npm / maven installs. Phoenix (Arize's OSS LLM observability platform) is the reference backend for OpenInference traces, also OSS, but the spans go to any OTel backend without changes.

What does OpenInference cover?

Three things. (1) Semantic conventions: the openinference.* attribute namespace for LLM spans. (2) Instrumentation libraries across Python, JavaScript / TypeScript, and Java (LangChain4j, Spring AI). (3) A reference backend (Phoenix) and integration patterns for any OTel backend. The Python coverage is broadest: OpenAI, Anthropic, Bedrock, Groq, MistralAI, Vertex AI, LiteLLM, LangChain, LlamaIndex, DSPy, CrewAI, Agno, OpenAI Agents, AutoGen, PydanticAI, Smolagents, Haystack, BeeAI, and more. Always check the live repo for the current package list.

How does OpenInference compare with traceAI, OpenLLMetry, OpenLIT, and Lilypad?

All five are OTel-aligned LLM instrumentation frameworks. OpenInference is one of the longest-running, with strong Python coverage and Java packages including LangChain4j and Spring AI. traceAI is FutureAGI's Apache 2.0 instrumentation framework with 35+ frameworks across Python, TypeScript, Java, and C#. OpenLLMetry from Traceloop emphasizes OTel GenAI alignment and ships Python and TypeScript packages. OpenLIT focuses on broad-vendor coverage and a unified telemetry plane (LLM + GPU + DB) under OTel. Lilypad focuses on prompt-versioning-as-traces. The instrumentations are interchangeable for many backends; the choice is often vendor preference plus language coverage.

What is the openinference.* attribute namespace?

A set of standardized attribute names for LLM spans. Examples: openinference.span.kind (LLM, CHAIN, RETRIEVER, EMBEDDING, TOOL, AGENT, RERANKER, GUARDRAIL, EVALUATOR), llm.model_name, llm.input_messages, llm.output_messages, llm.token_count.prompt, llm.token_count.completion, llm.tools, retrieval.documents, embedding.embeddings, input.value, output.value. The conventions are documented at openinference.ai. Most attributes mirror or wrap the OTel GenAI gen_ai.* namespace.

How do I instrument my app with OpenInference?

Pick the package matching your stack, install, register the auto-instrumentor, set up an OTLP exporter pointing at your backend. For OpenAI Python: pip install openinference-instrumentation-openai opentelemetry-exporter-otlp; call OpenAIInstrumentor().instrument(); set OTEL_EXPORTER_OTLP_ENDPOINT. From there every OpenAI call emits a span. For LangChain, swap OpenAIInstrumentor for LangChainInstrumentor. The pattern is identical across instrumentations.

How are OpenInference packages versioned?

OpenInference is split into many independently-versioned packages on PyPI (Python), npm (JS/TS), and Maven (Java). Each instrumentation ships its own minor and patch releases as the upstream framework changes. The semantic conventions are also versioned in the spec folder of the repo. Check Arize-ai/openinference on GitHub and the per-package registry pages for the version-current state of attributes and supported integrations.

View all

Research

Best OTel Instrumentation Tools for LLMs in 2026: 6 Compared

OpenInference, traceAI, OpenLLMetry, OpenLIT, OTel-contrib, and vendor SDKs as the 2026 OTel-for-LLMs shortlist. License, language coverage, gen_ai.* support.

Nikhil Pareek · Aug 28, 2025

13 min

Research

What Does a Good LLM Trace Look Like in 2026: Anatomy and Attributes

Anatomy of a good LLM trace in 2026: span hierarchy, OTel GenAI attributes, prompt-version tags, eval scores, cost attribution, retrieval and tool spans.

NVJK Kartik · Mar 18, 2026

12 min

Research

What is LLM Tracing? Spans, OTel GenAI, and Sampling in 2026

LLM tracing is structured spans for prompts, tools, retrievals, and sub-agents under OTel GenAI conventions. What it is and how to implement it in 2026.

Rishav Hada · Jan 16, 2026

13 min