What is echo cancellation in voice AI?

Echo cancellation removes the voice agent's playback from the microphone path so ASR hears the caller instead of a delayed copy of the agent. FutureAGI treats it as an upstream audio condition that affects audio quality, transcription, turns, and call outcomes.

How is echo cancellation different from noise suppression?

Echo cancellation subtracts known playback audio from microphone input. Noise suppression reduces unrelated background sound such as fans, traffic, keyboard clicks, or room noise.

How do you measure echo cancellation?

Use captured audio, turn-event traces, AudioQualityEvaluator, ASRAccuracy, and user-outcome signals such as self-transcription rate, false voice activity, repeated corrections, and escalation rate.

What Is Echo Cancellation? FutureAGI Guide (2026)

What Is Echo Cancellation?

Echo cancellation is the voice-AI audio process that removes the agent’s speaker output from the caller microphone path so downstream ASR hears the human, not a delayed copy of the agent itself. It is a voice reliability control that appears before transcription in production traces, LiveKit simulations, and eval datasets. FutureAGI treats it as an upstream audio condition and measures its impact through AudioQualityEvaluator, ASRAccuracy, turn-event traces, and call outcomes such as repeated corrections or failed barge-in.

Why Echo Cancellation Matters in Production Voice Agents

Echo cancellation failures contaminate the first stage of a voice agent. If agent playback leaks back into the microphone, ASR may transcribe the agent’s own words as if the caller said them. That creates transcript contamination, self-interruption loops, false voice activity, and broken double-talk handling, where the caller and agent speak at the same time and the user gets clipped.

Developers usually see the bug as a reasoning issue because the transcript looks plausible: the agent answered a phrase that appears in text. SREs see longer calls, higher p99 time-to-first-audio, more ASR retries, and turn detectors firing during agent playback. Product teams see callers repeat themselves, say “no, I did not say that,” or abandon a call. Compliance teams lose a clean audit trail when consent, payment, or healthcare instructions are mixed with playback artifacts.

The problem is sharper in 2026 voice-agent stacks because a call is not just ASR plus TTS. The same trace may include voice activity detection, endpointing, LLM reasoning, tool calls, policy checks, model fallback, and spoken confirmation. A small residual echo can make a scheduling agent cancel the wrong appointment or make a support agent repeat a disclosure forever. Useful logs include playback start and end, microphone speech windows, overlap duration, ASR transcript, TTS segment ID, device class, codec, and whether the final task still completed.

How FutureAGI Handles Echo Cancellation

FutureAGI’s approach is to treat echo cancellation as an upstream audio reliability condition, not as a standalone FutureAGI evaluator. The canonical FAGI anchor for this glossary term is none; the nearest FutureAGI surfaces are the traceAI livekit integration, the simulate-sdk LiveKitEngine, AudioQualityEvaluator, and ASRAccuracy. This keeps the page honest: FutureAGI does not need to pretend echo cancellation is a model metric when it is really a media-pipeline property with downstream eval impact.

A practical workflow starts with LiveKitEngine simulations for speakerphone, browser tab, car Bluetooth, headset, and noisy-room scenarios. The run should keep the caller audio path, agent audio path, ASR transcript, TTS segment timestamps, voice activity events, turn overlap duration, echo_cancellation_enabled metadata, and scenario ID. FutureAGI can then score the captured audio with AudioQualityEvaluator, compare transcript quality with ASRAccuracy, and inspect whether echo artifacts correlate with failed task completion or interruption handling.

Example: a billing voice agent says, “Your balance is paid,” and that playback leaks into the caller channel. ASR transcribes “paid” during the next user turn, so the agent skips a refund question and closes the ticket. Unlike a raw WebRTC echoCancellation: true flag, which only says the browser attempted acoustic echo cancellation, the FutureAGI workflow asks whether the call outcome was harmed. The engineer alerts on audio-quality fail rate by device, reviews traces where agent TTS appears in caller transcripts, adjusts media settings or provider routing, and reruns the same regression scenarios before release.

How to Measure or Detect Echo Cancellation

Measure echo cancellation by joining audio, timing, transcript, and outcome signals:

AudioQualityEvaluator: FutureAGI evaluator for whether captured or generated speech is usable enough to trust downstream scoring.
ASRAccuracy: companion evaluator for whether residual echo caused speech-to-text substitutions, insertions, or deletions.
Self-transcription rate: percentage of caller transcript tokens that match recent agent TTS segments during agent-only playback.
False voice activity: VAD events fired while only the agent should be speaking, sliced by device, browser, codec, and room condition.
Dashboard signals: audio-quality fail rate, word error rate, turn-overlap duration, repeated correction rate, escalation rate, and p99 turn latency.

from fi.evals import AudioQualityEvaluator, ASRAccuracy

audio_eval = AudioQualityEvaluator()
asr_eval = ASRAccuracy()
audio_score = audio_eval.evaluate(audio_path=call_audio).score
asr_score = asr_eval.evaluate(audio_path=call_audio, ground_truth=reference).score
print(audio_score, asr_score)

Do not stop at a pass/fail media flag. The real question is whether residual echo changed the transcript, turn policy, tool action, or final caller outcome.

Common Mistakes

Most echo bugs come from assuming the media stack solved the issue before the agent ever saw the call.

Treating echoCancellation: true as proof. Device drivers, browser paths, SIP bridges, and speaker volume still change residual echo.
Testing only with headphones. Speakerphones, conference rooms, mobile loudspeaker mode, and car Bluetooth create the worst leakage.
Scoring cleaned transcripts only. You miss the timing evidence that proves the caller never said the echoed phrase.
Tuning noise suppression alone. Noise suppression reduces background sound; echo cancellation subtracts known playback from the microphone stream.
Ignoring double-talk. Aggressive cancellation can suppress the caller exactly when they interrupt a speaking agent.