JS distance is the square root of Jensen-Shannon divergence — a symmetric, bounded (0 to 1) distance between two probability distributions, used to quantify how different a current distribution is from a baseline.

How is JS distance different from KL divergence?

KL divergence is asymmetric, unbounded, and undefined when the two distributions have non-overlapping support. JS distance is symmetric, bounded between 0 and 1, and finite even when supports differ — making it safer for production drift alarms.

How do you measure JS distance in an LLM pipeline?

Bin or histogram the feature, then compute Jensen-Shannon divergence with `scipy.spatial.distance.jensenshannon` and report the result as JS distance. FutureAGI surfaces drift signals on traceAI spans and `Dataset` cohorts.

JS Distance: Definition & FutureAGI Guide (2026)

What Is JS Distance?

JS distance, or Jensen-Shannon distance, is the square root of Jensen-Shannon divergence: a symmetric, bounded model-monitoring metric for comparing two probability distributions on the same support. It ranges from 0 for identical distributions to 1 with base-2 logs for fully separated distributions, and it stays finite when distributions barely overlap. Production LLM teams use JS distance in traces, evaluation dashboards, and drift monitors to detect feature, embedding, prompt, and output-token distribution shifts; FutureAGI treats it as a cohort-level reliability signal.

Why JS Distance Matters in Production LLM and Agent Systems

Most LLM-application failures in production look like quality regressions, but they often start as silent distribution shifts. Users send prompts that look slightly different from last month, embeddings concentrate in new regions of the latent space, retrieved chunks come from a different cohort of documents, and tool-call argument shapes drift. Each of these is a probability distribution that has moved. Without a stable distance metric, the team only sees the downstream symptom — eval-fail-rate creeping up, refusal rate spiking, or thumbs-down rate ticking on one route — long after the drift began.

JS distance gives platform engineers and ML engineers a single bounded number to alert on. Unlike KL divergence, which is undefined when the production distribution touches a region the baseline never covered, JS distance handles disjoint support gracefully. Unlike Wasserstein distance, it doesn’t need a ground metric on the support, so it works on token vocabularies and categorical features without extra geometry.

In 2026 agent stacks, the cost of missing drift compounds. A planner step that has drifted toward longer tool sequences, a retriever fed by an embedding model that subtly rotated, or a router that now sends 12% more traffic to a fallback model — each shows up as a JS distance change before it becomes a user-facing incident. Treat JS distance as the canary for “the input world has changed.”

How FutureAGI Handles JS Distance Drift

FutureAGI doesn’t ship a JSDistance evaluator class because JS distance is a distribution-level summary statistic, not a per-row score. Instead, the platform exposes JS-distance-style drift detection on top of dataset cohorts and traceAI spans. At dataset level, when an engineer compares a production cohort against a golden-dataset baseline, the platform computes per-feature distribution distances (PSI, KL, JS) and surfaces the JS-distance value alongside the feature name. At trace level, traceAI spans emitted from traceAI-openai, traceAI-langchain, or traceAI-llamaindex carry llm.input.messages, llm.output.text, llm.token_count.prompt, and tool-call argument fields — any of which can be windowed and compared period over period.

For embedding drift, FutureAGI’s fi.evals.EmbeddingSimilarity evaluator gives per-pair similarity scores; aggregating those across cohorts produces a JS-distance-equivalent signal on embedding clusters. A practical workflow: a RAG team running on traceAI-pinecone samples 5% of last week’s retrieval traces, computes JS distance between the prompt-token histogram now versus 30 days ago, and alerts when the value crosses 0.15. They then drill into the Dataset.add_evaluation regression report to confirm whether ContextRelevance degraded with the drift, and decide whether to retrain the retriever or update prompts. FutureAGI’s approach is to make distribution distance a first-class production signal, not a notebook one-off.

How to Measure or Detect JS Distance

Pick a binning strategy that matches the feature, then compute the Jensen-Shannon distance. Production-grade signals include:

Per-feature JS distance: histograms of numeric features (e.g. token count, latency) binned to 20–50 buckets, baseline vs. current.
Embedding-cluster JS distance: cluster prompts/responses, compute JS over cluster-membership distributions; pair with fi.evals.EmbeddingSimilarity.
Output-token JS distance: token-frequency drift on llm.output.text — rises when refusal rate or template language shifts.
Dashboard signal: alert when JS distance exceeds 0.1–0.2 over a 24-hour window vs. a 30-day baseline.
Pair with eval-fail-rate: drift without quality regression is informational; drift plus regression is incident.

import numpy as np
from scipy.spatial.distance import jensenshannon

baseline = np.array([0.10, 0.20, 0.40, 0.30])
current  = np.array([0.05, 0.15, 0.45, 0.35])

# JS distance (already the sqrt — base-2 log)
js_dist = jensenshannon(baseline, current, base=2)
print(round(js_dist, 4))

Common mistakes

Use JS distance as a production signal only when the measurement policy is repeatable. The common failures are calibration errors, not math errors.

Conflating JS distance and JS divergence. Distance is the square root of divergence; mixing them in alerts produces inconsistent thresholds and noisy incident review.
Ignoring binning choice. Different bin counts produce different JS values; pin the binning policy across baseline and current windows before comparing cohorts.
Alerting on JS distance alone. Drift is informational without paired eval-fail-rate, thumbs-down rate, or escalation-rate movement. Pair signals before paging.
Using JS distance on tiny windows. Small samples produce unstable distance estimates; set a minimum window of 1–2k samples before alerting.
Comparing against a stale baseline. A baseline from 6 months ago always shows drift; refresh it on a defined cadence and preserve version history.