What Is Cost Attribution (LLM Apps)?
The practice of slicing LLM spend back to the unit that caused it — user, tenant, prompt version, route, feature, or agent node.
What Is Cost Attribution (LLM Apps)?
Cost attribution is the practice of slicing total LLM spend back to the unit that caused it. The unit can be a user, a tenant, a prompt version, a route, a feature flag, an agent node, or an entire trace. It runs on top of two inputs: per-span token counts (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) and a current provider price table. The output is gen_ai.cost.total per span, plus aggregations sliced by tag dimensions like user.id, session.id, gen_ai.prompt.template.version, and route name. In LLM observability it answers the unit-economics questions a finance lead and a product lead both need.
Why It Matters in Production LLM and Agent Systems
LLM bills are big and lumpy. A reasoning model burning 40K output tokens at $15 per 1M tokens costs $0.60 per turn. Multiply across retries, judge evals, and tool calls, and a single feature can cost more than a user’s monthly subscription. Without attribution, the bill arrives as one number and nobody can answer “which feature drove this?” or “is the new agent flow paying for itself?”
The pain shows up at four roles. Product cannot calculate gross margin per feature without per-feature spend. Finance cannot reconcile the provider invoice without trace-level attribution. Engineering cannot tell whether a prompt change shipped at 2pm raised costs by 18% — they only see the aggregate spike. GTM cannot price tiers without knowing the marginal cost of a power user.
Three failure modes are common. The runaway agent: a planner that hits an infinite loop burns $50 of tokens before timing out — invisible without per-trace cost. The stale prompt: an old prompt version still in production carries 3K extra retrieved-context tokens per call — invisible without gen_ai.prompt.template.version on the span. The free-tier abuser: one customer running batch evals consumes 40% of monthly budget — invisible without user.id on every span.
In 2026 agent stacks, cost is the second-most operational signal after latency, and span-level attribution is what turns it from a finance problem into an engineering one.
How FutureAGI Handles Cost Attribution
FutureAGI’s approach is to compute cost per span at write time, attach it to the trace, and let the platform aggregate by any tag. The pipeline runs in three places.
Instrumentation: traceAI emits gen_ai.request.model, gen_ai.usage.input_tokens, and gen_ai.usage.output_tokens on every LLM span. The Agent Command Center gateway adds gen_ai.cost.input, gen_ai.cost.output, and gen_ai.cost.total by multiplying token counts by the current price for the routed model — pricing is maintained centrally so providers’ frequent price drops do not require redeploy.
Tagging: spans carry user.id, session.id, and gen_ai.prompt.template.version set via OTel context attributes from the application code. The futureagi-sdk Dataset.log() and the traceAI register call both expose helper APIs to attach these consistently.
Aggregation: the platform exposes pre-built dashboards for cost-by-user, cost-by-prompt-version, cost-by-route, cost-by-feature, and cost-by-tenant. The same data feeds the Agent Command Center routing policy: cost-optimized decision: when an underperforming model crosses a cost-per-task threshold, the router redirects traffic to a cheaper variant and model fallback kicks in if the cheaper variant fails an fi.evals.TaskCompletion post-guardrail.
The differentiator vs. provider dashboards (OpenAI usage, Anthropic console) is per-trace, per-prompt, and per-feature granularity. Provider dashboards stop at the API key level. FutureAGI joins gateway data and trace data in one query layer, so “what did this user, prompt, or feature cost over the last 7 days at p99 token usage” is a single query, not a CSV stitching exercise. Unlike Helicone, which sees only gateway-level costs, FutureAGI’s spans see inside the agent loop and surface which agent node burned the budget.
How to Measure or Detect It
Wire these dimensions on every span and aggregation:
- Cost attributes:
gen_ai.cost.input,gen_ai.cost.output,gen_ai.cost.total, plusgen_ai.cost.cache_writefor prompt cache. - Identity tags:
user.id,session.id,service.name,gen_ai.prompt.template.version. - Token inputs:
gen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.usage.cache_read_tokens(cache-read is usually cheaper). - Aggregations: cost-per-user p99, cost-per-trace p95, cost-by-prompt-version delta, cost-by-route over rolling 7 days.
- Eval correlation: cost-per-
TaskCompletion=passvs cost-per-TaskCompletion=fail— quantifies wasted spend.
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("user_request") as span:
span.set_attribute("user.id", user.id)
span.set_attribute("session.id", session.id)
span.set_attribute("gen_ai.prompt.template.version", "v3.2")
# downstream LLM spans inherit context; cost rolls up by trace
response = run_agent(query)
Common Mistakes
- Aggregating only at the API-key level. Provider dashboards stop there. Cost attribution lives at user, prompt, route, and feature granularity — not the key.
- Forgetting prompt version tags. Without
gen_ai.prompt.template.version, post-rollout cost regressions cannot be tied to the prompt that caused them. - Treating cache reads as full-priced inputs. Cache reads are typically billed at 10–25% of input price. Use
gen_ai.usage.cache_read_tokensand a separate cache rate. - Hardcoding price tables in code. Provider prices change weekly. Centralize the price table in the gateway or in a config service.
- Splitting trace and gateway data across two systems. Cost attribution becomes a CSV stitching problem. Pick a stack that unifies both.
Frequently Asked Questions
What is cost attribution?
Cost attribution slices total LLM spend back to the unit that caused it — a user, tenant, prompt version, route, feature, or agent node — by joining per-span token counts with provider price tables and aggregating by tag dimensions.
How is cost attribution different from token usage tracking?
Token usage tracking is the raw count of input and output tokens per span. Cost attribution multiplies those counts by current per-token prices, then aggregates by user, prompt version, route, or feature. Tokens are the input; dollar attribution is the output.
How do you implement cost attribution?
Tag every span with user.id, session.id, and prompt template version. TraceAI emits gen_ai.usage.input_tokens and gen_ai.usage.output_tokens automatically. Multiply against the price table to populate gen_ai.cost.total, then aggregate by tag dimensions in your trace store.