How is voicemail detection different from voice activity detection?

Voice activity detection identifies speech versus non-speech in an audio stream. Voicemail detection uses speech, silence, greeting phrases, beep timing, and call state to classify the answer type.

How do you measure voicemail detection?

FutureAGI teams measure it with LiveKitEngine simulations, custom answer-type trace fields such as `call.answer_type` and `voicemail.beep_timestamp_ms`, plus ASRAccuracy, CustomerAgentTerminationHandling, and TaskCompletion. Track false-human and false-voicemail rates by carrier, locale, and scenario.

What Is Voicemail Detection? FutureAGI Guide (2026)

Q: What is voicemail detection?

Voicemail detection is the voice-AI process of deciding whether an outbound call reached a live person, voicemail greeting, beep, carrier intercept, or silence before the agent acts.

What Is Voicemail Detection?

Voicemail detection is a voice-AI classification step that decides whether a call reached a live person, a voicemail greeting, a post-greeting beep, a carrier message, or silence. It shows up in the telephony/audio stage of a voice-agent trace before the LLM, tool policy, or TTS system acts. In FutureAGI, teams treat it as a routing and evaluation signal, then test it with LiveKitEngine scenarios and adjacent metrics such as ASRAccuracy and TaskCompletion.

Why Voicemail Detection Matters in Production Voice Agents

A wrong answer-type label sends the call down the wrong branch. False-human detection makes an agent start its live script while a mailbox greeting is still playing, so the transcript contains the agent talking to a recorder. False-voicemail detection is worse for revenue: a real person is marked unreachable, the agent hangs up or leaves a generic message, and the CRM records a failed contact.

The pain is shared across engineering, operations, and compliance. Developers see clean LLM prompts but confusing call outcomes because the first branch was wrong. SREs see short-call spikes, retry storms, long silence windows, and p99 answer-to-first-audio drift after a telephony, ASR, or voice-activity-detection change. Product teams see lower connect-to-conversation rates on mobile, carrier-specific, or international cohorts. Compliance teams worry about agents leaving policy-sensitive content in recorded mailboxes or failing to disclose identity on live pickup.

This matters more for 2026 voice agents because answer type often triggers a multi-step agent workflow. The detection result can decide whether to speak, wait for the beep, summarize the call, schedule a retry, update a sales sequence, or suppress further outreach. Common trace symptoms include machine labels followed by live user speech, human labels followed by voicemail phrases, TTS playback before beep time, high hang-up rate within five seconds, and repeated “hello?” turns after the agent has already committed to the wrong path.

How FutureAGI Handles Voicemail Detection

The catalog anchor is none, so FutureAGI does not claim a dedicated VoicemailDetection evaluator. FutureAGI’s approach is to treat voicemail detection as a voice workflow invariant backed by simulation, trace evidence, and downstream evals. The closest real surfaces are simulate-sdk LiveKitEngine, traceAI livekit, ASRAccuracy, CustomerAgentTerminationHandling, and TaskCompletion.

A practical setup starts with Persona and Scenario definitions for live pickup, short voicemail greeting, delayed beep, no-beep mailbox, carrier intercept, long silence, noisy human pickup, and voicemail that begins with human-like speech. LiveKitEngine runs the scenarios against the live voice agent and captures audio, transcript, timing, and TestCaseResult outcomes. The application records span fields such as call.answer_type, call.answer_type_confidence, voicemail.beep_timestamp_ms, agent.first_audio_after_answer_ms, and agent.action.

Unlike simple silence-threshold detection, this workflow checks whether the branch was correct and whether the call outcome stayed correct. ASRAccuracy verifies that greeting phrases and beep-adjacent transcript evidence were captured accurately. CustomerAgentTerminationHandling is useful when the expected behavior is to end, retry, or leave a compliant message. TaskCompletion catches cases where the classifier was technically right but the agent still failed the business goal.

The engineer’s next action is concrete: block release if false-voicemail rate exceeds 1% on live-pickup scenarios, alert when p99 answer-classification latency exceeds the speech policy, or route uncertain calls to a short clarification turn instead of updating the CRM.

How to Measure or Detect Voicemail Detection

Measure voicemail detection as a classifier plus a downstream workflow test:

Answer-type confusion matrix: true human, voicemail greeting, beep, carrier intercept, silence, and unknown.
False-human rate: voicemail or carrier messages mislabeled as a live person.
False-voicemail rate: live people mislabeled as voicemail; track this separately because it hides lost conversations.
Beep timing error: milliseconds between detected beep and reference beep; late detection truncates messages, early detection talks over greetings.
p99 answer-classification latency: time from call answer to stable branch decision, sliced by carrier, locale, and audio quality.
Trace fields: call.answer_type, call.answer_type_confidence, voicemail.beep_timestamp_ms, agent.action, and termination_reason.
FutureAGI evals: ASRAccuracy returns speech-to-text accuracy, while TaskCompletion checks whether the expected call outcome happened.

from fi.evals import ASRAccuracy

result = ASRAccuracy().evaluate(
    prediction="please leave a message after the tone",
    reference="please leave a message after the tone",
)
print(result.score)

Use user-feedback proxies too: callback rate, repeat-dial rate, human-transfer rate, and QA tags for “agent spoke to voicemail.”

Common Mistakes

Most voicemail detection mistakes come from optimizing the dialer metric while ignoring the conversation branch that follows.

Treating silence as voicemail. Long pauses can be network delay, cautious human pickup, speakerphone latency, or accessibility-device behavior.
Ignoring the beep. Detecting a mailbox greeting is not enough; the agent must wait for the message window.
Optimizing only false-human rate. That protects mailboxes but can hide live contacts marked unreachable.
Testing one carrier path. Mobile voicemail, enterprise PBX greetings, SIP trunks, and carrier intercepts produce different audio signatures.
Letting the agent speak too early. A fast greeting can still be wrong if TTS starts before answer type stabilizes.
Not separating compliance by branch. Live pickup disclosure and voicemail message policy often have different required text.