What Is a Denial-of-Service (DoS) Attack?
An attack that exhausts a service's compute, memory, network, or quota resources so legitimate users cannot get a response.
What Is a Denial-of-Service (DoS) Attack?
A denial-of-service (DoS) attack is an attempt to exhaust a service’s compute, memory, network, or quota resources so legitimate users cannot get a response. Classic DoS floods a server with traffic; the distributed variant (DDoS) does it from many sources. For LLM systems, the more interesting variants don’t flood — a single crafted prompt can fill the context window, trigger expensive reasoning, fan out tool calls, or burn the model-provider quota. DoS is a textbook LLM security threat in the OWASP LLM Top 10. FutureAGI’s surface is detection, attribution, and rate-controlled mitigation in production.
Why It Matters in Production LLM and Agent Systems
LLM-flavoured DoS hits the budget and the user experience at the same time. A prompt that contains a 100k-token wall of repeated text consumes a context window, forces a long completion, and may be 100× the cost of a normal request. A jailbreak that pushes an agent into an infinite-loop of tool calls fans out compute until a timeout finally fires. A retriever-side DoS sends millions of similar queries that warm the embedding cache out from under legitimate traffic.
The pain is shared across roles. SREs see p99 latency spikes, error budget burn, and unexpected cloud bills. Finance and platform leads see model-provider quota exhaustion and rate-limit errors that look like outages. Compliance leads worry about service-availability commitments under SLA. Product managers see angry users when a DoS-style attack degrades the experience for everyone on the same model deployment.
In 2026-era agent stacks the blast radius compounds. A single DoS pattern can starve every multi-agent workflow sharing the same upstream model. Step-level evals plus per-cohort cost dashboards are the only way to localise the attack before the bill arrives.
How FutureAGI Handles Denial-of-Service Attacks
FutureAGI is not a network-layer DDoS filter — that is your CDN’s job. We cover the LLM-flavoured surface where DoS shows up as runaway compute, runaway tokens, or runaway tool calls. The Agent Command Center exposes rate-limiting per route, per model, and per user cohort, so a flooding pattern hits a quota long before it bills out. tool-timeout policies kill long-running tool calls; model fallback switches to a cheaper backend when the primary is rate-limited; cost-optimized routing lets engineers pin a cost ceiling per route.
In traces, runaway-cost is a tracked failure mode: when token-cost-per-trace crosses a per-cohort threshold or when an agent crosses a max-step ceiling, an alert fires with the offending trace pinned. PromptInjection is layered as a pre-guardrail to catch the obvious context-flooding payloads (long repeated tokens, malformed instructions designed to force long reasoning). The combination — guardrail at the edge, rate-limit at the gateway, cost-monitor in the trace — is what differentiates a defended LLM system from one that is one bad prompt away from a billing incident. Unlike Cloudflare’s L7 DDoS layer, FutureAGI sees the semantic DoS surface: tokens, tool calls, and reasoning loops.
How to Measure or Detect It
DoS-relevant signals you can wire into a FutureAGI workflow:
runaway-cost— token-cost-per-trace alert; standard dashboard signal in the gateway.tool-timeoutrate — fraction of tool calls killed by timeout; alerts on attack patterns and bugs alike.infinite-loop-agentdetector — agent step count past a ceiling.PromptInjectionevaluator — flags context-flooding payloads as part of injection.eval-fail-rate-by-cohort— segment cost and latency anomalies by route or user tier.- Rate-limit error counts at the gateway — a leading indicator of attack traffic.
Minimal Python:
from fi.evals import PromptInjection
injection = PromptInjection()
result = injection.evaluate(
input=suspicious_payload,
output=None,
context="dos detection",
)
Common Mistakes
- Relying only on a network rate-limiter. L7 rate-limiting misses single-prompt DoS that hides inside one well-formed HTTP request. Add semantic limits.
- No per-cohort cost ceiling. A global ceiling protects the company; a per-cohort ceiling stops one tenant from starving others.
- Letting tool calls run unbounded. A missing
tool-timeoutis a DoS amplifier — a single hung tool call holds the agent loop open. - Not monitoring quota error rate. Provider-side rate-limit errors are an early-warning signal; alert before they hit production.
- Treating DoS as a separate concern from prompt injection. Many DoS payloads are prompt-injection variants; wire them into the same evaluator pipeline.
Frequently Asked Questions
What is a denial-of-service attack?
A denial-of-service (DoS) attack tries to exhaust a service's compute, memory, network, or quota resources so legitimate users cannot get a response, by flooding the service or by submitting expensive single requests.
How is an LLM denial-of-service different from a classic DoS?
Classic DoS targets bandwidth or CPU. LLM-flavoured DoS targets the context window, token budget, tool-call fanout, or model quota — a single crafted prompt can be more damaging than a flood of trivial requests.
How do you mitigate denial-of-service against LLM systems?
Wire `rate-limiting`, `tool-timeout`, and `model fallback` in the Agent Command Center, alert on `runaway-cost` in traces, and use the FutureAGI evaluator stack to flag obvious context-flooding payloads via `PromptInjection`.