What Is a Self-Service Chatbot?
A conversational AI system that lets users resolve their own questions or tasks without escalating to a human agent.
What Is a Self-Service Chatbot?
A self-service chatbot is a conversational AI system that lets a user resolve a question or task on their own, without escalating to a human agent. Modern self-service chatbots are LLM-powered, grounded by retrieval, and increasingly agentic — they call tools to look up account state, execute refunds, reschedule appointments. They sit in front of a contact center, web app, or product, and they’re judged by containment rate (sessions resolved without human handoff), grounded-answer rate, resolution rate, and customer-satisfaction score. They fail loudly when they hallucinate or escalate too aggressively.
Why It Matters in Production LLM and Agent Systems
A self-service chatbot is the public face of an AI deployment — every failure is a user-facing one. Hallucinated refund policies, wrong-account lookups, and infinite tool-call loops translate directly into refunds-issued-in-error, tickets-mis-routed, and customer churn. Containment rate is the headline business metric: every percentage point of containment that the bot loses is a percentage point of additional human-agent cost.
The pain is felt unevenly. A product lead watches CSAT drop 8 points after a model swap and cannot tell whether it’s grounding, tone, or escalation logic. An SRE sees p99 latency spike when a tool starts throttling and the bot retries silently for 30 seconds before giving up. A compliance lead is asked, mid-audit, “how do you know this bot doesn’t give bad financial advice?” and has no logged evaluation to point to.
In 2026-era stacks, self-service chatbots are no longer single-turn QA bots. They are multi-turn agents with memory across the conversation, tool calls into CRMs and payment systems, and handoffs to specialised sub-agents. Evaluation has to follow that structure: per-turn grounding, trajectory-level resolution, conversation-level coherence, and explicit guardrails on the actions the bot is allowed to take. Single-shot answer-relevance scores miss most of the failure surface.
How FutureAGI Handles Self-Service Chatbots
FutureAGI’s approach is to evaluate the bot at three levels along the same conversation trace. Per-turn, Groundedness and ContextRelevance score whether each answer is supported by retrieved knowledge-base content. Trajectory-level, TaskCompletion and ConversationResolution score whether the user’s actual goal got solved across the multi-turn session — not just whether the last reply was relevant. Conversation-level, ConversationCoherence and CustomerAgentContextRetention check that the bot didn’t lose state between turns or contradict itself.
For action-taking bots, ToolSelectionAccuracy and FunctionCallAccuracy score whether the bot picked the correct CRM call or refund-API invocation given the conversation state, and ActionSafety flags any action that should have escalated to a human.
Concretely: a banking team running a self-service balance-and-transfer bot on traceAI-openai-agents instruments every conversation, samples 5% into an eval cohort, runs ConversationResolution, Groundedness, and ToolSelectionAccuracy on each, and dashboards containment-rate-by-intent. When containment for the “transfer money” intent drops from 78% to 61% after a model swap, FutureAGI’s trace view points to a single planner step where the smaller model started escalating prematurely — fix at one step, recover containment, no need to roll the whole release.
How to Measure or Detect It
Track signals at the conversation, turn, and tool levels:
ConversationResolution: returns whether the user’s goal was resolved across the full conversation; the canonical containment proxy.TaskCompletion: returns 0–1 plus reason for whether the assigned task was completed end-to-end.Groundedness: per-turn grounding against retrieved KB content; surfaces hallucinated policy or pricing.CustomerAgentHumanEscalation: scores whether escalation decisions were appropriate — too eager wastes containment, too late frustrates users.- Containment rate (dashboard signal): fraction of sessions that closed without human-agent handoff, sliced by intent.
- Average turns to resolution: paired with containment, surfaces bots that “win” by exhausting users.
Minimal Python:
from fi.evals import ConversationResolution, Groundedness
resolution = ConversationResolution()
grounding = Groundedness()
result = resolution.evaluate(
input=user_goal,
output=full_conversation,
context=knowledge_base_chunks,
)
print(result.score, result.reason)
Common Mistakes
- Optimising containment without checking grounding. A bot that confidently makes things up has high containment and high churn. Track both.
- Single-turn answer-relevance as the only metric. Misses state loss, contradictions, and unresolved goals across the session.
- No guardrail on action-taking tools. A self-service bot with unguarded refund authority is one prompt-injection away from a financial incident.
- Skipping per-intent evaluation. Aggregate metrics hide the one intent where the bot fails 40% of the time.
- Treating CSAT as the eval. CSAT is a trailing signal; evaluator scores catch failures hours before users complain.
Frequently Asked Questions
What is a self-service chatbot?
A self-service chatbot is a conversational AI that lets users resolve issues themselves — order lookups, password resets, account changes — without handing off to a human, usually built on an LLM with retrieval and tool calls.
How is a self-service chatbot different from a regular chatbot?
A regular chatbot answers questions; a self-service chatbot also takes action — books, refunds, updates — and is measured by containment and resolution, not just response quality.
How do you evaluate a self-service chatbot?
FutureAGI runs ConversationResolution and TaskCompletion across full conversation traces, plus Groundedness on retrieval-backed answers, all attached to a Dataset for cohort-level tracking.