What Is a Span (OpenTelemetry)?
A single timed operation within a trace, carrying timestamps, status, parent reference, and an attribute bag of metadata.
What Is a Span (OpenTelemetry)?
A span is a single timed operation inside a trace — one LLM call, one retrieval, one tool invocation, one guardrail check, one sub-agent dispatch. Every span has a start timestamp, an end timestamp, a status (OK / ERROR), a span_id, a parent_span_id, a trace_id, and an attribute bag of key/value metadata. In LLM observability the attribute bag carries the gen_ai.* fields — model id, token counts, prompt, completion, latency — plus fi.span.kind to label whether the operation was an LLM, RETRIEVER, TOOL, AGENT, CHAIN, EMBEDDING, or GUARDRAIL step. Spans nest into a tree to form the trace.
Why It Matters in Production LLM and Agent Systems
The span is the granularity at which you can answer engineering questions. “What model produced this output?” is a span attribute. “How long did the retriever take?” is span duration. “Did the guardrail block?” is span status. “Which tool did the agent pick?” is gen_ai.tool.name on the tool span. Without spans, every question collapses into “look at logs” — a slow, error-prone scan that loses causal structure.
The pain shows up most in two places. First, agent debugging: a five-step agent that fails on step three is invisible at the trace level (the trace is just “agent run, slow”) but obvious at the span level (the third TOOL span returned a 500). Second, regression triage: a prompt change that ships at 2pm and silently degrades quality at 3pm shows up as drift in span-attached eval scores filtered by gen_ai.prompt.template.version. Without per-span attributes, you cannot slice the regression to the prompt that caused it.
In 2026 agent stacks, the span is also the eval target. Span-attached evals — fi.evals.Groundedness written back as gen_ai.evaluation.score.value on the LLM span — turn quality into a per-step signal. You alert on “any LLM span where Groundedness < 0.7 in the last 5 minutes” the same way you alert on http.status_code >= 500.
How FutureAGI Handles Spans
FutureAGI emits spans via traceAI, its OpenTelemetry library. The schema follows OTel’s GenAI semantic conventions plus FutureAGI extensions. Every span carries:
fi.span.kind— one of LLM / RETRIEVER / TOOL / AGENT / CHAIN / EMBEDDING / RERANKER / GUARDRAIL / EVALUATOR.gen_ai.system,gen_ai.provider.name,gen_ai.request.model,gen_ai.response.model.gen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.usage.total_tokens.gen_ai.client.operation.duration,gen_ai.server.time_to_first_token.gen_ai.cost.totalpopulated from gateway pricing.gen_ai.input.messages/gen_ai.output.messages(opt-in for content capture).
For a traceAI-langchain instrumented app, every chain step produces a span with these fields auto-populated. Custom spans use the OTel SDK — wrap a region with tracer.start_as_current_span("custom_step"), set attributes, and the span joins the parent trace automatically.
The differentiator is span events — point-in-time records nested inside a span. FutureAGI writes evaluator results as span events: when fi.evals.Groundedness runs against a span’s output, it writes gen_ai.evaluation.name, gen_ai.evaluation.score.value, and gen_ai.evaluation.explanation as a span event linked via gen_ai.evaluation.target_span_id. The eval verdict travels with the span, which means filtering “all LLM spans with grounded score < 0.7 from user cohort B in the last hour” is one query, not a join across two systems.
How to Measure or Detect It
Spans are the measurement primitive — what to track on each one:
- Identity:
trace_id,span_id,parent_span_id,fi.span.kind. - Model context:
gen_ai.request.model,gen_ai.system,gen_ai.request.temperature. - Token counts:
gen_ai.usage.input_tokens,gen_ai.usage.output_tokens(legacy traceAI also emitsllm.token_count.prompt,llm.token_count.completion). - Latency:
gen_ai.client.operation.duration,gen_ai.server.time_to_first_tokenfor streaming. - Status:
OK/ERROR, pluserror.typeanderror.messagewhen failed. - Eval scores:
gen_ai.evaluation.score.valuefromfi.evals.Groundednessor similar.
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("custom_retrieval") as span:
span.set_attribute("fi.span.kind", "RETRIEVER")
span.set_attribute("gen_ai.retrieval.top_k", 10)
docs = vectorstore.search(query, k=10)
Healthy span dashboards: p99 span duration per fi.span.kind, span error-rate per service, span count per trace.
Common Mistakes
- Conflating span and trace. Engineers often say “trace” when they mean “span.” A trace is the tree; a span is the node. Use the right word in dashboards and runbooks.
- Skipping
parent_span_idon async work. Without OTel context propagation, async tool calls orphan from their parent and the trace tree breaks. - Cardinality bombs in attributes. Putting raw user query strings into a label-style attribute (one used in queries) explodes index size. Put content in
gen_ai.input.messages(opt-in, content-style) and identifiers inuser.id/session.id. - Ignoring span events. Span events carry evaluator scores, retries, and intermediate states. A trace UI that only renders attributes loses half the signal.
- No span limits. A span with 10,000 attributes is unrenderable. Cap attribute count and string length per span.
Frequently Asked Questions
What is a span?
A span is one timed operation inside a trace — an LLM call, a retrieval, a tool call, or a guardrail check — with a start time, end time, status, parent reference, and an attribute bag of key/value metadata.
What is the difference between a span and a trace?
A trace is the whole request lineage; a span is one operation inside it. Traces contain many spans; a span belongs to exactly one trace. Spans nest into a tree via parent_span_id to form the trace structure.
How do you add attributes to a span?
TraceAI auto-populates gen_ai.* attributes (model, tokens, latency) for every instrumented call. Custom attributes go on the active span via the OpenTelemetry SDK: span.set_attribute('user.id', user_id).