What Is a Pre-Guardrail?
An input-side policy check that blocks, redacts, rewrites, or escalates risky LLM requests before model inference or tool execution.
What Is a Pre-Guardrail?
A pre-guardrail is an input-side compliance control that runs before an LLM or agent receives a request. In production, it appears in the gateway request path, where it inspects user prompts, retrieved context, tool inputs, and metadata for prompt injection, jailbreaks, PII, or unsafe instructions. FutureAGI treats pre-guardrails as route-level policies inside Agent Command Center, so risky traffic can be blocked, redacted, rewritten, or escalated before model inference or tool execution starts.
Why It Matters in Production LLM and Agent Systems
The concrete failure mode is letting hostile or noncompliant input reach a system that can reason, call tools, and write state. A direct prompt injection can tell a support agent to ignore policy. An indirect prompt injection hidden in a web page or retrieved document can override the agent after retrieval. A user may paste a payment card, patient identifier, or contract text into a chat box that was never approved for that data class.
The first people to feel this are developers and SREs. They see traces with odd system-prompt references, tool calls that do not match user intent, rising escalation tickets, and sudden token-cost spikes from injected instructions that expand the task. Compliance teams feel it later, when audit logs show that sensitive input reached a model provider or a privileged tool before any policy decision was recorded. End users feel it as broken trust: the agent refuses legitimate work, follows a malicious instruction, or exposes data it should never have processed.
Pre-guardrails matter more for 2026-era agentic pipelines than for single-turn chat because the input boundary keeps moving. The input is not just the user’s message. It includes retrieval chunks, tool observations, memory, uploaded files, and messages passed between agents. A pre-guardrail gives each boundary a synchronous allow, block, redact, or escalate decision before the next model step compounds the mistake.
How FutureAGI Handles Pre-Guardrails
FutureAGI handles pre-guardrails through Agent Command Center, the gateway surface named by the gateway:pre-guardrail anchor. A route can attach a pre-guardrail chain before provider routing, fallback, semantic-cache lookup, or tool execution. The chain can use exact detectors for known formats and fi.evals evaluator classes for model-aware checks: ProtectFlash for low-latency prompt-injection screening, PromptInjection for deeper injection classification, PII for sensitive identifiers, and DataPrivacyCompliance for policy fit.
A real workflow: an enterprise assistant route receives a user prompt plus retrieved account notes. Agent Command Center runs pre-guardrail: [ProtectFlash, PII, DataPrivacyCompliance] before the selected model sees the request. If ProtectFlash flags an indirect injection in a retrieved note, the gateway blocks that context chunk and logs the reason. If PII flags a national identifier in the user prompt, the action can redact the span, ask the user to remove sensitive data, or escalate to a human review queue. The trace records the route, guardrail name, decision, reason, and action.
FutureAGI’s approach is to keep the policy decision close to the traffic boundary, then reuse the same evaluator names in offline regression tests. Unlike a one-off Guardrails AI validator embedded in one service, a gateway pre-guardrail can be updated once and applied consistently across agents, providers, and routes. Engineers then inspect failed traces, tune thresholds on a labeled cohort, and promote the new policy only when false positives and p99 latency stay inside budget.
How to Measure or Detect a Pre-Guardrail
Measure a pre-guardrail as an operational control, not as a single accuracy score:
ProtectFlashresult — returns a lightweight prompt-injection signal for low-latency blocking before expensive inference.PromptInjectionresult — identifies direct and indirect instruction attacks that try to override the system or developer policy.PIIfire-rate — counts prompts or context chunks containing sensitive identifiers per 1,000 requests.- False-positive rate by route — sample blocked requests, label them, and compute how often legitimate traffic was stopped.
- p99 latency added — compare the request path with and without the pre-guardrail chain; track it per model route.
- Audit-log completeness — every block, redact, rewrite, or escalation should have a recorded decision and reason.
from fi.evals import ProtectFlash, PII
injection = ProtectFlash()
privacy = PII()
if injection.evaluate(input=user_prompt).score == "Failed":
return "block"
if privacy.evaluate(input=user_prompt).score == "Failed":
return "redact"
Common Mistakes
- Checking only the top-level user prompt. Retrieved documents, memory, tool observations, and agent handoffs can carry the risky instruction.
- Using one universal threshold. A public demo route and a healthcare workflow should not share the same privacy and injection tolerance.
- Blocking without an audit reason. Compliance teams need the input class, detector, action, timestamp, and route to defend the decision.
- Treating pre-guardrails as prompt engineering. Prompt text can request safe behavior; a gateway policy can enforce it before inference.
- Ignoring p99 latency. A guardrail that adds 600 ms to every chat turn will be bypassed by product teams under launch pressure.
Frequently Asked Questions
What is a pre-guardrail?
A pre-guardrail is an input-side compliance and safety check that runs before an LLM or agent receives a request, so unsafe prompts, PII, jailbreaks, or tool inputs can be blocked or rewritten early.
How is a pre-guardrail different from a post-guardrail?
A pre-guardrail checks the request before inference or tool execution. A post-guardrail checks the model response before it reaches the user or the next system.
How do you measure a pre-guardrail?
Use FutureAGI evaluator classes such as ProtectFlash, PromptInjection, PII, and DataPrivacyCompliance, then track block-rate, false-positive rate, p99 latency added, and audit-log completeness.