What Is PII Detection?
Runtime and offline detection of personally identifiable information in LLM inputs, retrieved context, tool outputs, and generated responses.
What Is PII Detection?
PII detection is the process of finding personally identifiable information in LLM inputs, retrieved context, tool outputs, traces, and model responses. It is an AI compliance control, not just a text-classification task, because the same identifier can move across an eval pipeline, gateway route, agent tool call, or production trace. FutureAGI anchors this control with eval:PII, the PII evaluator surface used to flag, block, redact, and audit personal-data exposure before it reaches users or logs.
Why It Matters in Production LLM and Agent Systems
PII leaks rarely look like one obvious broken form field. They show up when a support agent summarizes a CRM note and includes a phone number, when a RAG pipeline retrieves a billing PDF into context, or when an agent calls a lookup_user tool and echoes a row that belonged to another customer. The failure mode is sensitive-information disclosure: the model did not invent a secret, but it moved one across a boundary where it did not belong.
The pain splits across teams. Compliance needs an audit trail for GDPR, HIPAA, SOC 2, or internal privacy policy. Security needs proof that raw prompts are not copied into an analytics store. Product sees blocked responses and asks whether the policy is too strict. Developers get the hard part: separating true PII from harmless names in examples, test accounts, and public business data.
Common symptoms include rising redaction counts, PII failures concentrated on one route, trace payloads with email or phone patterns, and customer tickets mentioning unexpected personal details. In 2026-era agent pipelines, the problem gets harder because PII can enter through tools, memory, retrieval, file uploads, or agent-to-agent handoffs. A single endpoint scanner cannot see that whole path; detection has to run at model boundaries and in offline regression evals.
How FutureAGI Handles PII Detection
The anchor surface for this term is eval:PII, exposed in FutureAGI as the PII evaluator in fi.evals. A common workflow starts with a labeled dataset containing clean prompts, direct identifiers, indirect identifiers, and tricky business text such as “call Alex from Acme.” The team runs PII and DataPrivacyCompliance in the eval pipeline at /platform/guard, sets a metric-threshold for acceptable fail rate, then promotes the same control to Agent Command Center as a pre-guardrail and post-guardrail.
Real example: a healthcare support agent uses traceAI-langchain instrumentation, a retrieval step, and a scheduling tool. PII can enter through the patient message, the retrieved note, or the tool output. FutureAGI’s approach is to score each candidate boundary: user input before the model, tool output before it is added to context, and final model output before streaming to the user. The route records the guardrail decision, evaluator name, and trace ID so the engineer can debug a false positive without exposing the raw identifier broadly.
Unlike regex-only scanners or default Microsoft Presidio patterns, this workflow checks generated responses and tool outputs in the same reliability loop as other LLM evals. If PII fail rate spikes after a prompt change, the engineer rolls back the prompt version, tightens the post-guardrail action from audit to redact, or adds a regression case before the next release.
How to Measure or Detect It
Measure PII detection as a classifier plus a runtime control:
PIIevaluator fail rate: percentage of evaluated inputs or outputs flagged for personal data, sliced by route, model, dataset version, and source step.- Precision and recall: test against labeled clean and PII-bearing examples; high recall matters for SSNs, health identifiers, and account numbers.
- False-positive review rate: percentage of blocked or redacted outputs later marked clean by reviewers; this is the metric that keeps guardrails usable.
- Guardrail latency: p95 and p99 added by
pre-guardrailandpost-guardrailchecks before the response reaches the user. - Audit-log completeness: every block, redact, or escalate decision should include evaluator, route, trace ID, action, and reason.
from fi.evals import PII
pii = PII()
result = pii.evaluate(output=response_text)
print(result.score, result.reason)
| Boundary | Trigger | Default action | Latency budget |
|---|---|---|---|
| User input pre-store | Direct identifier patterns | Redact before trace persist | < 30 ms |
| Retrieved context / tool output | Tenant-foreign or sensitive class | Block, route to fallback | < 60 ms |
| Model response pre-delivery | Generated identifier | Redact or block by class | < 100 ms |
| Audit-log sampling | New formats, obfuscation | Add to regression set | async, no SLA |
| Cross-tenant memory | Foreign tenant identifier | Hard block + page on-call | < 50 ms |
External calibration: Gray Swan’s AgentHarm (110 agentic-harm prompts spanning PII exfiltration, doxxing, and credential leakage) shows frontier models leak PII on 8-22% of attack prompts before guardrails, dropping below 2% with judge-model detection in front of the gateway. BeaverTails (~333K labeled QA pairs across 14 harm categories including privacy violations) is the broader anchor for false-positive budgets when calibrating per-class thresholds.
Detection placement playbook
A useful 2026 PII-detection deployment runs at four boundaries, not one. Boundary 1: user input, before the model sees it. Catches direct identifiers the user typed; produces the option to redact before the trace is even stored. Boundary 2: retrieved context and tool outputs, before they enter the prompt. This is the most-missed boundary. a clean user request can still pull a customer record from a CRM that exposes another tenant’s data. Boundary 3: the generated response, before delivery. The classic post-guardrail position. Boundary 4: audit-log sampling, after delivery. Continuous monitoring that catches the new identifier formats that the in-line detector misses.
Each boundary has a different action policy. Boundary 1 usually redacts before storage. Boundary 2 blocks the tool result and routes the agent to a fallback response. Boundary 3 redacts or blocks based on identifier class. Boundary 4 generates regression cases for the next release of the detector.
A second 2026-specific note: indirect identifiers. A ZIP code, employer, role, and timestamp combined can identify a person even when no direct identifier is present. The PII evaluator is calibrated for direct identifiers; for indirect-identifier risk, pair it with DataPrivacyCompliance, which evaluates re-identification risk over combinations. We’ve seen healthcare and HR teams catch policy violations with the pair that a direct-identifier detector missed entirely.
Common Mistakes
- Scanning only user prompts. Retrieved context, memory, tool outputs, and model responses leak PII more often than the first user message.
- Treating names as always sensitive. A person’s name may be public business data or private customer data; label context matters for precision.
- Saving raw traces before detection. If the observability store receives unfiltered PII, it becomes part of the compliance scope.
- Using one threshold for every identifier. SSNs, payment data, phone numbers, and names need different actions and false-positive tolerance.
- No detector regression set. PII formats change across countries, products, and user segments; rerun known clean and known risky examples before policy changes.
- Letting the detector run only synchronously. A blocking detector on every request adds latency and creates an outage when the detector itself is down; pair it with asynchronous sampling at the audit-log boundary so coverage survives detector incidents.
Frequently Asked Questions
What is PII detection?
PII detection finds personally identifiable information in LLM inputs, context, tool outputs, traces, and responses. FutureAGI anchors it with the `PII` evaluator so teams can block, redact, or audit risky personal-data exposure.
How is PII detection different from PII redaction?
PII detection identifies sensitive personal data; PII redaction removes or masks it. In production LLM systems, detection usually triggers a guardrail action such as block, redact, escalate, or audit.
How do you measure PII detection?
Measure `PII` evaluator fail rate, precision, recall, false-positive rate, and redaction latency by route. FutureAGI teams usually track these beside audit-log completeness.