What is barge-in in voice AI?

Barge-in is the ability for a caller to interrupt a speaking voice agent and have the agent stop, listen, and respond to the new input without losing conversation context.

How is barge-in different from turn detection?

Turn detection identifies when speakers start, stop, or yield. Barge-in is the production behavior that decides what the voice agent does when a user starts speaking while the agent is still talking.

How do you measure barge-in?

FutureAGI measures barge-in with `CustomerAgentInterruptionHandling`, LiveKitEngine simulations, turn-event traces, interruption latency, resumed intent accuracy, and escalation rate by cohort.

What Is Barge-In? Definition & FutureAGI Guide (2026)

What Is Barge-In?

Barge-in is the voice-agent behavior where a caller interrupts spoken output and the system stops talking, listens, and handles the new input without losing context. It is a voice-AI reliability behavior that appears in real-time audio traces, turn-detection events, eval pipelines, and simulated calls. FutureAGI evaluates it with CustomerAgentInterruptionHandling, LiveKitEngine scenarios, and call-level signals such as interruption latency, resumed intent accuracy, repeat-request rate, and escalation rate.

Why Barge-In Matters in Production Voice Agents

Barge-in failures make a voice agent feel unresponsive even when its final answer is correct. If the caller says “actually, change that to Friday” while the agent is still reading Tuesday’s confirmation, a bad system keeps speaking, misses the correction, or treats the interruption as background noise. The named failure modes are missed interruption, late stop, context loss after interruption, and false resume, where the agent restarts from the wrong state.

The pain is split across teams. Developers see flaky calls that pass in text because the LLM can handle the correction once it appears in the transcript. SREs see longer call duration, higher p99 time-to-first-audio after interruptions, and more retries in ASR or TTS stages. Product teams see callers repeat themselves, talk over the agent, or hang up. Compliance teams lose audit clarity when the stored transcript hides who spoke over whom and what the agent had already said.

Voice agents in 2026 are full pipelines: ASR, voice activity detection, endpointing, turn policy, LLM reasoning, tool calls, guardrails, and TTS. Barge-in sits across those boundaries. Unlike transcript-only QA in Vapi or a raw LiveKit event export, production evaluation has to preserve timing, audio, and agent state together. The useful logs include interruption timestamps, agent speech offset, stop latency, overlapping audio windows, resumed transcript, tool calls after interruption, and whether the user goal still completed.

How FutureAGI Handles Barge-In

FutureAGI’s approach is to treat barge-in as an interruption-handling outcome attached to a full voice run, not as a single ASR event. The specific FAGI surface for this term is CustomerAgentInterruptionHandling, the evaluator class behind the eval:CustomerAgentInterruptionHandling anchor. In a voice workflow, that evaluator is attached to simulated or sampled calls where a caller interrupts a speaking customer agent.

A practical setup starts in the simulate-sdk. An engineer defines Persona records such as impatient buyer, elderly patient, or noisy warehouse operator, groups them into a Scenario, and runs the calls with LiveKitEngine. The run keeps the audio path, agent speech segments, caller speech segments, turn events, ASR transcript, final response, and task outcome. FutureAGI then evaluates the call with CustomerAgentInterruptionHandling; teams often pair it with ASRAccuracy, AudioQualityEvaluator, and TaskCompletion to separate bad listening from bad task execution.

Example: a pharmacy refill voice agent must stop reading policy language when the caller says, “No, I need insulin pens, not needles.” FutureAGI records the interruption window, checks whether the agent stopped promptly, and scores whether the resumed answer used the corrected intent. If interruption failures rise for mobile calls with packet loss, the engineer can alert on eval-fail-rate-by-cohort, adjust endpointing thresholds, route noisy calls to a slower confirmation flow, and rerun the regression suite before release.

How to Measure or Detect Barge-In

Measure barge-in with timing, transcript, and outcome signals together:

CustomerAgentInterruptionHandling: FutureAGI evaluator for whether a customer agent handles an interruption according to the configured rubric.
Stop latency: milliseconds from caller speech start to agent speech stop; track p50, p95, and p99 by channel.
Resumed intent accuracy: whether the post-interruption response reflects the new user intent, not the old agent script.
Turn-event trace fields: caller speech start, agent speech offset, overlap duration, endpointing decision, and next agent action.
User proxies: repeated correction rate, hang-up rate after overlap, human escalation after interruption, and thumbs-down rate by cohort.

from fi.evals import CustomerAgentInterruptionHandling

barge_in_eval = CustomerAgentInterruptionHandling()
result = barge_in_eval.evaluate(
    transcript=call_transcript,
    audio_events=turn_events,
    expected_behavior="stop speaking and answer the corrected request"
)
print(result.score)

Do not use a global pass rate alone. A voice agent may handle polite interruptions well and fail callers who speak fast, use low-quality microphones, or interrupt during tool-confirmation steps.

Common Mistakes

The most common barge-in errors come from treating interruption as a speech-runtime option instead of an evaluated agent behavior.

Scoring only the final transcript. You miss late stop, overlapping speech, clipped words, and the user’s visible frustration.
Using VAD as the barge-in policy. Voice activity detection detects speech; it does not decide how the agent should resume.
Ignoring tool state. An interruption during payment, booking, or cancellation may require rollback, confirmation, or a safer branch.
Setting one stop-latency threshold. Phone, browser, SIP, and mobile networks need separate baselines and alerts.
Treating interruptions as noise. Many interruptions are corrections, consent changes, or safety-critical updates to the task.