What Is DTMF?
An in-band telephony signaling standard that encodes keypad presses as pairs of audio tones, used for IVR navigation and caller authentication.
What Is DTMF?
DTMF — Dual-Tone Multi-Frequency — is the in-band telephony signaling standard that encodes telephone keypad presses (0-9, *, #, A-D) as pairs of audio tones in the 697-1633 Hz range. Each key produces a unique two-tone combination that the receiving system decodes deterministically. DTMF is the technology behind every “press 1 for sales” IVR and is still used in voice-AI agent stacks for fallback navigation, caller authentication, and interop with legacy CCaaS infrastructure. FutureAGI does not generate or decode DTMF directly, but evaluates voice agents that handle DTMF-driven flows through ASRAccuracy, AudioQualityEvaluator, and LiveKitEngine simulation.
Why DTMF Matters in Production Voice-AI Systems
Most voice-AI agents shipping in 2026 are hybrid: they accept both speech and DTMF, often on the same call. DTMF is the deterministic fallback when ASR is unreliable — a customer in a noisy environment, a caller with a heavy accent the ASR was not trained on, a regulatory flow that requires deterministic input (“enter your account number using your keypad”). When DTMF handling is broken, the agent fails the calls where it is needed most.
Voice engineers feel DTMF failure as broken IVR transitions: the user presses 2, the agent stays in the previous menu state, and the caller hangs up. SREs see it as time-to-first-audio spikes and unusual silence-after-tone gaps in the trace. CX teams see it as task-completion drops on calls that include keypad input. Compliance leads care because regulated industries (banking, telehealth) often require DTMF-only entry for sensitive identifiers — an account number recovered via ASR is unacceptable in many audit frameworks.
In 2026 voice-agent stacks, DTMF is also a security boundary. A DTMF-driven authentication flow is harder to spoof than a voice-only flow because the tones carry no biometric signal. Voice agents that fall back to DTMF for sensitive input therefore need to handle the transition cleanly — the wrong handoff between speech and tone is a source of caller frustration and a verification gap.
How FutureAGI Handles DTMF in Voice-Agent Evaluation
FutureAGI does not encode or decode DTMF tones directly — that work happens in the SIP, WebRTC, or telephony layer (LiveKit, Pipecat, vendor SDK). FutureAGI’s role is to evaluate the voice agent that operates above that layer. The simulation surface is LiveKitEngine, which runs scripted scenarios where a synthetic caller mixes spoken input and DTMF tone sequences. A Persona object can describe a caller who reaches a “press 1 to confirm” prompt and presses the wrong digit, or who ignores the prompt and speaks instead.
A real evaluation flow: a banking voice agent is tested against a Scenario containing 50 personas covering keypad-based account lookup. LiveKitEngine runs each persona through the agent and captures the transcript plus the DTMF event timeline. FutureAGI evaluators score the result: ASRAccuracy for the spoken portions of the call, AudioQualityEvaluator for end-to-end audio fidelity, and a CustomEvaluation rubric that grades whether the agent handled DTMF input correctly (right route, right confirmation, right escalation). TaskCompletion then grades the overall trajectory.
FutureAGI’s approach is to treat DTMF as one event class among many in the voice-agent trace, not as a separate system. The same eval pipeline that scores spoken input scores DTMF-driven branches; cohort-level dashboards split by input modality so the team can see DTMF-specific failure rates. We’ve found this is the only way to catch agents that score well on speech-only evals but quietly fail on calls that mix modalities.
How to Measure or Detect DTMF-Related Issues in a Voice Agent
Measure DTMF-handling quality through trace and simulation signals:
fi.evals.ASRAccuracy— runs on the spoken portions of a call; isolates whether the speech-side of the transition is healthy.fi.evals.AudioQualityEvaluator— checks end-to-end audio fidelity, including any DTMF tone bleed or echo into the speech path.LiveKitEnginesimulation — scripted scenarios that inject DTMF events and capture agent behavior.TaskCompletionon voice trajectories — confirms the agent reached the goal regardless of input modality.- DTMF-success-rate dashboard — fraction of DTMF events that result in the expected state transition; track per route.
- Time-to-first-audio after a DTMF event — long gaps signal handoff lag in the agent runtime.
from fi.evals import ASRAccuracy, AudioQualityEvaluator
asr = ASRAccuracy()
audio = AudioQualityEvaluator()
result_asr = asr.evaluate(
response="enter your account number",
expected_response="enter your account number",
)
result_audio = audio.evaluate(input="/audio/call_dtmf_segment.wav")
print(result_asr.score, result_audio.score)
Common Mistakes
- Treating DTMF as out-of-scope for evaluation. Hybrid agents that touch DTMF in production must be tested with DTMF-aware scenarios.
- Logging keypad input only as decoded digits. Without the tone-event timeline, you cannot debug timing issues between speech and DTMF handoffs.
- No fallback path when DTMF decode fails. A noisy line can corrupt the tones; agents should fall back to confirmation prompts, not silently drop the input.
- Mixing DTMF with sensitive ASR transcription. Reading an account number via ASR is unacceptable in regulated flows; route those segments to DTMF-only.
- Skipping cohort splits by modality. Speech-only eval scores will mask DTMF-specific failures.
Frequently Asked Questions
What is DTMF?
DTMF — Dual-Tone Multi-Frequency — is the audio signaling standard that encodes telephone keypad presses (0-9, *, #, A-D) as pairs of tones. It is the basis of every IVR menu and is the standard way callers send numerical input over a phone line.
How is DTMF different from voice input?
Voice input transmits human speech as audio for an ASR system to transcribe. DTMF transmits structured digits as tones; the system decodes them deterministically with no ASR involvement, which makes it more reliable for short structured input like account numbers.
How does DTMF fit into a voice-AI agent?
Modern voice agents accept either spoken or DTMF input depending on the call leg. FutureAGI evaluates how well the agent handles DTMF fallbacks using LiveKitEngine simulation, ASRAccuracy on the surrounding speech, and TaskCompletion on the overall trajectory.