Evaluation

What Is a Contact Center Macro Metric?

A high-level aggregated KPI that summarizes contact-center health across many contacts — FCR, CSAT, cost-per-contact, AI deflection rate — reported to leadership.

What Is a Contact Center Macro Metric?

A contact center macro metric is a high-level, aggregated KPI that summarizes contact-center health across a large set of contacts — for example, monthly first-call resolution, weekly CSAT, quarterly cost-per-contact, abandonment rate, or AI deflection rate. Macro metrics are reported to leadership and used for budget, staffing, and vendor decisions. They sit on top of micro metrics (per-call AHT, per-rubric pass rate, per-trace evaluator score). FutureAGI extends macro metrics for AI tiers via aggregated Dataset evaluation runs, ConversationResolution, IsCompliant, and trace-level evidence.

Why Macro Metrics Matter in Production LLM and Agent Systems

Macro metrics are how leadership sees the contact center. A weekly board deck with three lines on it — FCR, CSAT, cost-per-contact — drives more budget, hiring, and vendor decisions than any micro dashboard. Get the macro definition wrong and the entire org steers wrong.

The pain shifts when AI joins the floor. Legacy macro metrics were defined for a homogeneous human workforce on a single CCaaS platform. AI introduces categories the old definitions do not cover. A “first-call resolution” macro that includes the AI tier needs to define whether an AI-handled call is “first-call” or whether it counts as a deflection. A “cost-per-contact” macro that mixes AI and human costs needs to allocate model token cost, infrastructure, and human escalation cost into one number. A “CSAT” macro that surveys both AI-handled and human-handled calls needs to compare apples to apples, not vibes to ratings.

The roles seeing the pain. Finance leads cannot reconcile AI cost without a macro definition that includes inference. Operations leads see flat AHT but rising AI escalation reasons that the macro view buries. Compliance leads need a macro view of policy adherence, not just spot checks. Product leads need a macro showing AI deflection net of escalation, with confidence intervals.

How FutureAGI Handles Contact Center Macro Metrics

FutureAGI’s approach is to aggregate evaluator outputs into macro metrics the same way micro metrics roll up. The relevant surfaces are AggregatedMetric (combines multiple evaluator outputs into a composite score), Dataset versioning for time-windowed evaluation runs, ConversationResolution and IsCompliant for AI-tier resolution and compliance, and traceAI for the underlying spans. Composite macros export to CCaaS BI dashboards as additional series.

A concrete example: a financial-services contact center defines a “Resolution Rate” macro that includes both human and AI cohorts. The team builds an AggregatedMetric combining ConversationResolution, weighted by cohort size, with a separate “AI-Deflection Quality” macro that gates AI deflection on IsCompliant and Groundedness. Leadership sees one Resolution Rate line and one AI-Deflection Quality line, with confidence intervals tied to evaluator agreement. When a model swap drops AI-Deflection Quality 4 points, finance can hold release until the regression is fixed, with trace-level evidence backing the call.

Unlike a CCaaS macro report — which usually re-aggregates platform-native KPIs — FutureAGI’s macro metrics are evaluator-driven and reproducible across platforms.

How to Measure or Detect It

Macro metrics need to be defined deliberately, not aggregated by accident. Practical signals:

  • AggregatedMetric: combines multiple evaluator outputs into one composite macro.
  • ConversationResolution (cohorted): per-cohort resolution rate aggregated across a time window.
  • IsCompliant (cohorted): per-policy compliance rate at macro level.
  • AI deflection rate (dashboard signal): share of contacts fully handled by AI without human handoff.
  • Cost-per-resolved-contact: cost normalized by resolution, not by call count.
  • Macro confidence interval: variance band around the aggregated score; flags when sample size is too small.
from fi.evals import ConversationResolution, IsCompliant
import numpy as np

resolution_scores = [ConversationResolution().evaluate(conversation=c).score for c in week_calls]
compliance_scores = [IsCompliant().evaluate(output=c, policy="recording-disclosure").score for c in week_calls]
macro_resolution = np.mean(resolution_scores)
macro_compliance = np.mean(compliance_scores)

Common Mistakes

  • Aggregating without cohorting. A single “FCR” line that mixes AI and human contacts hides which cohort moved.
  • Using mean without confidence intervals. Macro metrics with small denominators on a sliced cohort can swing wildly week-over-week.
  • Inheriting platform-native definitions blindly. Each CCaaS defines FCR slightly differently; pick a definition and own it across platforms.
  • No threshold or alert on macro drops. A 6-point macro drop that does not page is a vanity metric.
  • Re-baselining macros silently. Changing the macro definition mid-quarter destroys quarter-over-quarter comparability.

Frequently Asked Questions

What is a contact center macro metric?

A macro metric is a high-level aggregated KPI summarizing contact-center health across many contacts — FCR, CSAT, AHT, cost-per-contact, AI deflection rate. It sits on top of micro metrics (per-call signals).

How is a macro metric different from a micro metric?

Micro metrics describe a single contact (per-call AHT, per-rubric pass). Macro metrics aggregate across a window and cohort. Macros drive leadership decisions; micros drive engineering and coaching.

How does FutureAGI build contact center macro metrics?

FutureAGI aggregates evaluator outputs (ConversationResolution, IsCompliant, ASRAccuracy) across cohorts and time windows via `AggregatedMetric`, exporting to CCaaS dashboards alongside legacy macro metrics.