How is wake-word detection different from voice activity detection?

Voice activity detection decides whether speech is present. Wake-word detection decides whether the detected speech matches an approved activation phrase closely enough to start the assistant or route the call.

How do you measure wake-word detection?

In FutureAGI, measure wake-word behavior with LiveKitEngine simulations, false-accept and false-reject rates, activation latency, ASRAccuracy, AudioQualityEvaluator, and sampled wake-event traces by noise and speaker cohort.

What Is Wake-Word Detection? FutureAGI Guide (2026)

What Is Wake-Word Detection?

Wake-word detection is the voice-AI process that listens for a configured activation phrase before a device, app, or voice agent starts recording, transcribing, or taking action. It is a voice reliability control that appears at the start of audio runtimes, edge listeners, simulations, and production traces. FutureAGI teams test it with LiveKitEngine scenarios, review wake-event evidence, and connect false accepts or missed triggers to ASRAccuracy, AudioQualityEvaluator, and downstream task outcomes.

Why Wake-Word Detection Matters in Production Voice Agents

Wake-word mistakes create two opposite failures: the agent wakes when nobody asked, or it stays idle when the user clearly tried to start. A false accept can record private background speech, start an unwanted support flow, or send partial audio into ASR. A false reject makes the product feel broken, especially when the user repeats the activation phrase and then gives the actual command before the system is ready.

The pain is spread across the stack. Developers debug intent classification even though the input was polluted by an accidental wake. SREs see activation latency spikes, higher abandoned-session counts, and audio pipelines that spend CPU on background speech. Product teams see poor first-turn conversion on speakerphone, far-field microphones, and accented pronunciations. Compliance teams care because the wake decision often defines when the system is allowed to retain audio.

For 2026-era voice agents, wake-word detection is not a small pre-processing step. It gates a multi-step pipeline: microphone capture, noise suppression, wake decision, ASR, LLM planning, tool calls, and TTS response. One bad activation boundary can create a full trace with no real user intent behind it. Common symptoms include wake events without a following command, repeated “are you there?” utterances, high false wakes during TV or call-center background audio, and transcripts whose first words are missing because the wake detector opened the stream late.

How FutureAGI Handles Wake-Word Detection

FutureAGI’s approach is to treat wake-word detection as an activation-gate signal in a voice workflow, not as an isolated keyword score. The fagi_anchor for this glossary term is none, so there is no dedicated WakeWordDetection evaluator in the current inventory. The practical FutureAGI workflow uses adjacent surfaces: simulate-sdk Persona, Scenario, and LiveKitEngine; traceAI livekit instrumentation; and the voice evaluators ASRAccuracy and AudioQualityEvaluator.

A production team can build a small wake-word regression suite with household noise, contact-center background speech, second speakers, Bluetooth delay, and near-miss phrases. LiveKitEngine runs those scenarios against the same voice stack used in staging. Each run stores the expected wake label, wake decision, activation latency, captured audio window, ASR transcript after wake, and final agent outcome. If the wake phrase is “Nora”, a negative case might include “no, run it later” spoken near the microphone; a positive case might include a soft-spoken user saying “Nora, cancel my appointment.”

FutureAGI then connects the wake decision to what happened next. ASRAccuracy helps catch late opens that drop the first command words. AudioQualityEvaluator helps separate model failure from low signal-to-noise audio. Unlike a Porcupine-only or openWakeWord-only report, the engineer can inspect whether a wake error actually caused a failed voice-agent trace. The next action is concrete: tighten the threshold, add negative examples, route low-confidence wake events to confirmation, alert on false-accept-rate drift, or block a release when noisy-room false rejects exceed the approved baseline.

How to Measure or Detect Wake-Word Detection

Measure wake-word detection as activation quality plus downstream damage:

False accept rate: share of negative audio windows that incorrectly trigger the agent.
False reject rate: share of valid activation phrases that fail to wake the system.
Activation latency: time from wake phrase end to ASR capture start, sliced by device, codec, noise level, and locale.
Captured-prefix loss: percentage of positive cases where the command after the wake phrase loses its first words.
ASRAccuracy: FutureAGI evaluator for transcript fidelity after wake; a late trigger often lowers the command transcript score.
AudioQualityEvaluator: FutureAGI evaluator that helps explain whether poor audio quality, not wake logic, caused the miss.
User proxies: repeated wake attempts, immediate hang-up rate, manual tap-to-talk fallback, and “device woke by itself” feedback.

from fi.evals import ASRAccuracy

asr = ASRAccuracy()
result = asr.evaluate(
    audio_path="wake-tests/nora-cancel-appointment.wav",
    ground_truth="Nora, cancel my appointment."
)
print(result.score)

Do not optimize only for the lowest false accept rate. A threshold that never wakes is privacy-friendly but unusable. Set separate release gates for quiet rooms, car audio, speakerphone, and far-field microphones.

Common Mistakes

Treating wake-word detection as ASR. ASR transcribes speech; the wake detector decides whether the speech should open the agent path.
Testing only the exact phrase. Near-misses, brand names, TV audio, and repeated background words drive most false accepts.
Using one threshold for every device. Far-field microphones, earbuds, browser audio, and phone calls have different noise and latency profiles.
Ignoring late-open errors. A wake can be correct but still start ASR too late, dropping the first command words.
Reporting aggregate accuracy. False accepts and false rejects have different costs; combine them only after cohort-level review.