What Is Noise Suppression?
Noise suppression removes unwanted background sound from speech audio so ASR, turn detection, and playback remain intelligible in voice agents.
What Is Noise Suppression?
Noise suppression in voice AI is the process of reducing background sound so speech remains intelligible for ASR, turn detection, and playback. It is a voice-reliability control that shows up in eval pipelines, LiveKit simulations, production traces, and release regression tests. FutureAGI treats it as part of audio quality, using AudioQualityEvaluator and ASRAccuracy to find when cafe noise, traffic, hold music, or microphone hiss corrupts transcripts and agent behavior.
Why Noise Suppression Matters in Production LLM and Agent Systems
Noisy input is one of the fastest ways to make a correct agent look wrong. A billing caller says “freeze my card,” background traffic masks the first word, ASR hears “my card,” and the agent moves into a generic account-help flow. The downstream LLM may follow instructions perfectly, but it is reasoning over damaged text. Noise suppression failure becomes a hidden data-quality problem, not just an audio problem.
Developers feel it as flaky intent classification and hard-to-reproduce transcript drift. SREs see higher retry turns, longer call duration, p99 latency spikes from repeated ASR passes, and region-specific failures tied to mobile networks or contact-center headsets. Product teams see “can you repeat that” loops, abandoned calls, and lower task completion. Compliance teams care when consent phrases, identity verification, payment amounts, or medical instructions are partially masked.
In 2026 voice-agent pipelines, noise also interacts with turn detection, barge-in, speaker diarization, and tool execution. Over-aggressive suppression can erase soft speech, backchannels, or named entities. Under-aggressive suppression lets keyboard clicks, music, or another speaker leak into the transcript. The symptoms to track are low transcription confidence, rising word error rate, audio-quality fail rate by cohort, longer silence gaps, barge-in misses, and downstream tool calls triggered by misheard intents.
How FutureAGI Handles Noise Suppression
FutureAGI’s approach is to evaluate noise suppression through the audio artifact and the transcript it produces. The cloud-template evaluator AudioQualityEvaluator scores the captured speech, and ASRAccuracy checks the resulting transcript. In a practical workflow, a team runs noisy customer-service scenarios through LiveKitEngine, captures caller and agent audio, and logs transcript, locale, codec, device type, noise profile, and scenario ID in the eval dataset or production trace.
The engineer then scores the same sample with AudioQualityEvaluator and ASRAccuracy. If audio quality passes but ASR accuracy drops, the failure may be model or vocabulary related. If both fail only for “warehouse-background” or “mobile-commute” cohorts, the suppression profile is damaging speech or leaving too much background signal. FutureAGI can gate a release on audio_quality >= 0.85 for 98% of noisy scenarios and route failing samples into a regression dataset.
Compared with transcript-only QA, this keeps the root cause visible. A readable transcript can hide caller strain, clipped consonants, or suppression artifacts that make a live call feel broken. The next action is specific: tune the suppressor profile, change capture constraints, roll back a codec update, switch ASR provider for the noisy cohort, or add the trace to a pre-release eval suite. That keeps audio fixes tied to user-visible failures, not isolated DSP knobs.
How to Measure or Detect Noise Suppression
Measure noise suppression by comparing raw audio, suppressed audio, transcript quality, and downstream outcome, not by checking the noise suppressor setting alone.
AudioQualityEvaluator— scores whether captured or generated speech is clear enough to use; compare raw versus suppressed samples when available.ASRAccuracy— catches cases where suppression improves loudness but deletes words, names, or digits from the transcript.- Trace fields — track
time-to-first-audio, transcription confidence, codec, locale, device class, packet loss, jitter, and noise profile. - Dashboard cuts — audio-quality fail rate by provider, cohort, scenario, and release; alert when noisy-scenario fail rate crosses the release threshold.
- User proxies — repeat-request rate, escalation rate, abandoned calls, and post-call thumbs-down tied to noisy call segments.
from fi.evals import AudioQualityEvaluator
evaluator = AudioQualityEvaluator()
result = evaluator.evaluate(
audio_path="samples/noisy-call.wav",
metadata={"noise_profile": "mobile-commute"}
)
print(result.score, result.reason)
Common Mistakes
- Treating the setting as the metric. Enabling
noiseSuppressionin WebRTC does not prove the call is intelligible after ASR and turn detection. - Suppressing too aggressively. Soft-spoken callers, backchannels, names, and digits can disappear while background noise looks lower.
- Testing only office audio. Real calls include traffic, kitchens, hold music, shared rooms, and low-quality microphones.
- Ignoring cohort splits. A global average hides failures for one locale, headset model, mobile carrier, or support queue.
- Skipping transcript comparison. Audio may sound cleaner while word error rate rises because consonants or short words were removed.
Frequently Asked Questions
What is noise suppression in voice AI?
Noise suppression reduces unwanted background sound before speech reaches ASR, turn detection, or playback. FutureAGI evaluates it through `AudioQualityEvaluator` and companion `ASRAccuracy` checks on captured call audio.
How is noise suppression different from voice activity detection?
Noise suppression cleans or reduces background audio. Voice activity detection decides when speech starts, stops, or overlaps with silence, even if the audio is still noisy.
How do you measure noise suppression?
Use FutureAGI's `AudioQualityEvaluator` on captured audio from `LiveKitEngine` simulations or production traces, then compare with `ASRAccuracy`. Track `audio_quality` fail rate by noise profile, codec, device, locale, and release.