What Is Data Granularity?
The level of detail at which data is captured, stored, and analyzed — from per-event traces to coarse per-tenant aggregates.
What Is Data Granularity?
Data granularity is the level of detail at which data is captured, stored, and analyzed. High granularity means many fine-grained rows — per-event traces, per-span attributes, per-token counts, per-row evaluator scores. Low granularity means coarse aggregates — per-day, per-tenant, per-cohort. Granularity is a design choice that determines which questions a system can answer, how much storage and compute it costs, and which privacy obligations apply. In LLM observability, FutureAGI captures span-level granularity by default so engineers can drill into evaluator failures rather than reach for an aggregate that hides them.
Why Data Granularity Matters in Production LLM and Agent Systems
When something breaks, granularity determines what you can investigate. A daily aggregate showing 3% eval-fail-rate tells you nothing about whether one tenant is at 30% and the rest are at 0%. A token-cost dashboard with only weekly totals masks a 10x cost spike that happened on Tuesday afternoon. A trace recorded at workflow level instead of span level cannot answer which retrieval call returned the wrong policy.
The pain is concrete. ML engineers cannot reproduce a regression because the aggregated logs lost the offending row. SREs see latency aggregates that smooth over a cohort outage. Product teams launch a feature, see flat metrics, and miss that the new flow is silently degrading the same 2% of users every day. Compliance teams need request-level evidence under audit; aggregate logs cannot satisfy “show me the policy version applied to this specific user.”
In 2026 agent stacks, granularity matters more because trajectories produce 5–20 spans per request. Aggregating to “request” level loses the step where the failure happened. The opposite extreme — capturing every byte of every embedding lookup — is wasteful and creates privacy exposure. Useful symptoms of wrong granularity: regressions that only show up by drilling into traces, dashboards where cohort filters reveal patterns invisible in totals, and audit requests that the data store can’t answer.
How FutureAGI Handles Data Granularity
FutureAGI’s approach is “capture high, aggregate up.” Traces from traceAI-langchain, traceAI-openai-agents, and traceAI-mcp record span-level data with attributes including agent.trajectory.step, llm.token_count.prompt, llm.token_count.completion, span duration, evaluator name, and decision. Each Dataset row stores its source id, ingestion timestamp, reviewer, evaluator output, and bin labels.
A practical workflow: an SRE sees an aggregate eval-fail-rate climb from 2% to 4% on a Groundedness metric. They drill from the dashboard’s daily aggregate into per-route, then per-prompt-version, then per-trace, then per-span. The granular trace shows the failing step had retrieved a particular vendor source. The fix is targeted: quarantine the source, add a regression eval, and re-run only the affected cohort. Without span-level granularity, the same investigation would have stopped at “the metric got worse.”
Agent Command Center routing policies use granular data too: cost-optimized routing decides per-request, not per-tenant, by reading the prompt-token bin. model fallback triggers on per-trace evaluator scores, not on hourly averages. Unlike Prometheus recording rules configured only as daily rollups, FutureAGI’s design preserves span-level evidence with retention controls so privacy obligations are still met. The engineer’s next move is concrete: drill into the trace, run a regression eval against the affected slice, and tighten the route or rubric.
How to Measure or Detect Data Granularity
Granularity itself is observable as a property of your data store:
- Span-attribute coverage — share of spans carrying
agent.trajectory.step,llm.token_count.*, evaluator-score, route, and prompt-version. - Drill-down depth — number of dashboard levels (route → prompt-version → trace → span) supported without re-aggregating.
AggregatedMetricoutputs — bins computed on top of granular rows; if an aggregator can’t reproduce a daily total from raw rows, granularity is broken.- Audit-readiness — time to answer a “show me request X” query end-to-end.
- Storage cost per million spans — granularity has a price; track it so retention policy stays defensible.
from fi.evals import AggregatedMetric, GroundTruthMatch
agg = AggregatedMetric(metrics=[GroundTruthMatch()])
# Aggregate up from row-level scores; granular rows must remain queryable.
Common mistakes
- Capturing only aggregates. Daily totals are cheap and useless when a regression hits.
- Capturing too much without retention. Span-level granularity needs a retention policy or the bill gets ugly fast.
- Skipping span attributes. A span without
agent.trajectory.stepor evaluator name is half a record. - Confusing granularity with privacy. Coarse data can still expose individuals via re-identification; granularity choice does not replace minimization.
- Aggregating at write time. Pre-aggregating destroys evidence that auditors and engineers later need.
Frequently Asked Questions
What is data granularity?
Data granularity is the level of detail at which data is captured, stored, and analyzed. High granularity means fine-grained rows like per-token or per-span; low granularity means coarse aggregates like per-day or per-tenant.
Why does granularity matter for LLM observability?
Without span-level granularity you cannot drill into a failure to find which step or which evaluator failed. Coarse aggregates hide cohort-specific regressions and make root-cause analysis nearly impossible.
How does FutureAGI handle granularity?
FutureAGI traces capture span-level granularity — agent.trajectory.step, llm.token_count, evaluator scores per row — and let engineers aggregate up. The default is high granularity with controlled retention.