How is a span attribute different from an OTel attribute?

A span attribute is metadata placed on one span. An OTel attribute is the broader OpenTelemetry key/value concept used across spans, metrics, logs, resources, and events.

How do you measure span attributes?

FutureAGI traceAI emits fields such as `fi.span.kind`, `gen_ai.request.model`, and `llm.token_count.prompt`; dashboards group traces by those fields to find latency, cost, and eval regressions.

What Is a Span Attribute? FutureAGI Guide (2026)

What Is a Span Attribute?

A span attribute is a key/value metadata field attached to one span in a distributed trace. In LLM and agent observability, it describes one production operation: model call, retrieval, tool execution, guardrail check, or evaluator run. FutureAGI traceAI integrations emit span attributes such as fi.span.kind, gen_ai.request.model, and llm.token_count.prompt, letting engineers filter traces, group failures by cohort, and debug multi-step pipelines without depending on raw log search.

Why span attributes matter in production LLM and agent systems

Production AI incidents usually start as vague symptoms: p99 latency rises, token spend jumps, a support answer hallucinates, or an agent stops after the wrong tool call. A trace shows the request path. Span attributes explain which exact operation changed.

If you ignore span attributes, three failure modes become harder to isolate. First, cost drift hides inside aggregate usage because no one can split token growth by gen_ai.request.model, prompt version, tenant, or route. Second, tool failures get misdiagnosed because the trace says “tool call” but the span lacks gen_ai.tool.name, status, arguments summary, or error.type. Third, eval failures lose causal context: a Groundedness failure exists, but it is not tied to the LLM span, retriever span, or post-guardrail span that produced the bad output.

The pain lands across teams. Developers read raw logs to reconstruct model and prompt state. SREs see latency spikes without fi.span.kind slices for LLM, retriever, tool, or guardrail work. Product teams cannot separate retrieval misses from answer-generation failures. Compliance teams cannot prove which path handled sensitive input.

This matters more for 2026-era agentic systems than for single-turn chat. One user request can cross a router, retriever, planner, tools, model fallback, and evaluator. Unlike LangSmith-style custom tags bound to one product surface, OpenTelemetry span attributes keep the query contract portable across FutureAGI, Phoenix, Datadog, Honeycomb, and Tempo.

How FutureAGI handles span attributes

FutureAGI’s approach is to treat span attributes as the join key between traces, evaluations, cost, and regression triage. In a RAG support agent instrumented with traceAI-langchain, FutureAGI receives one OpenTelemetry trace with child spans for the chain, retriever, LLM call, tool call, guardrail, and optional evaluator. Each span carries a compact set of queryable fields rather than a free-form log blob.

A typical LLM span records fi.span.kind="LLM", gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, llm.token_count.prompt, and llm.token_count.completion. A retriever span can add retrieval.index.name, retrieval.top_k, and retrieval.result.count. A tool span can add gen_ai.tool.name, tool.status, and error.type. Because traceAI-openai, traceAI-langchain, and traceAI-llamaindex normalize these fields across different SDK response shapes, the same dashboard query works across services.

The engineer’s next action is concrete. If p99 latency rises after release checkout-agent@2026.05.07, group duration by fi.span.kind and provider. If spend rises, compare llm.token_count.prompt and gen_ai.request.model by prompt version. If ContextRelevance passes but Groundedness drops, inspect LLM spans for the failing cohort and compare them with retriever attributes from the same trace.

FutureAGI can then alert on eval-fail-rate-by-cohort, open a regression eval, or route a risky cohort through Agent Command Center model fallback. The attribute is not the verdict; it is the dimension that makes the verdict actionable.

How to measure or detect span attributes

Span attributes are not a single score. Measure whether they are complete, correct, and useful during incidents:

Coverage: percentage of production LLM spans with non-null fi.span.kind, gen_ai.request.model, token counts, status, and trace id; target 99% or higher.
Cardinality: unique values per indexed attribute. Model names, prompt versions, route names, tenant tiers, and tool names are useful; raw prompts and user messages should not be indexed.
Latency slices: p99 duration by fi.span.kind, provider, route, and model to separate model latency from retrieval or tool latency.
Cost slices: token-cost-per-trace by gen_ai.request.model, gen_ai.usage.input_tokens, and llm.token_count.prompt.
Eval correlation: eval-fail-rate-by-cohort when Groundedness, ContextRelevance, or ToolSelectionAccuracy results are attached to the span that produced the output or action.

A minimal custom span attribute set:

from opentelemetry import trace

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("rerank_docs") as span:
    span.set_attribute("fi.span.kind", "RERANKER")
    span.set_attribute("retrieval.index.name", "refund-policy-v4")
    span.set_attribute("retrieval.result.count", 12)

Common mistakes

Putting raw prompts into indexed attributes. Store prompt content in redacted capture fields; keep attributes for stable dimensions such as model and prompt version.
Recording attributes only on the root span. Child spans need the facts: retriever index, tool name, model, status, token count, and error type.
Naming the same token field three ways. prompt_tokens, tokens_in, and llm.token_count.prompt fragment dashboards and alerts.
Treating SDK-reported fields as permanent. Recheck model ids, token fields, and tool-call shapes after provider SDK upgrades.
Omitting failure attributes. A failed span without error.type, status, and tool name sends responders back to raw logs.