How is script injection different from prompt injection?

Prompt injection tries to change model behavior through instructions in text. Script injection changes runtime behavior by getting untrusted text rendered or executed as code.

How do you measure script injection?

Use FutureAGI's CodeInjectionDetector on generated code, tool arguments, and rendered HTML boundaries. Track InjectionSecurityScore, trace source span, and block decisions by route.

What Is Script Injection? FutureAGI Guide (2026)

What is Script Injection?

Script injection is a security vulnerability where untrusted text is inserted into executable script, HTML, templates, or tool arguments and later runs as code. In LLM and agent systems, it is a security failure mode that shows up in eval pipelines, production traces, generated UI, code tools, and gateway guardrails. FutureAGI uses CodeInjectionDetector to detect CWE-94-style code-injection risk and keeps trace evidence showing which prompt, retrieval chunk, or tool output introduced the executable payload.

Why it matters in production LLM/agent systems

Script injection turns generated text into executable behavior. A support assistant might render a retrieved help-center article that contains a <script> tag. A coding agent might paste user-controlled text into a Node task. A report generator might write attacker-controlled HTML into an admin dashboard. The model may only be producing text, but the surrounding product decides whether that text is displayed, evaluated, compiled, or passed to a tool.

Ignoring it creates two named failures. Client-side execution happens when generated or retrieved content reaches a browser, email renderer, markdown preview, notebook, or internal admin UI without safe escaping. Tool-side code execution happens when an agent forwards user text into scripts, templates, shell wrappers, browser automation, or code-interpreter inputs. In both cases, the trace can look successful: the model answered, the tool returned, and latency stayed normal.

Developers see odd rendered markup, failed CSP reports, or unexpected callbacks. SREs see spikes in blocked requests, browser errors, tool retries, or outbound network calls from unexpected spans. Security and compliance teams need the exact source span because blame could sit in the user prompt, a RAG chunk, a third-party page, or an LLM-written tool argument. End users feel the harm as account takeover, data exposure, altered pages, or agent actions they never requested.

Agentic systems raise the risk because every step can transform text into execution. A 2026 workflow may retrieve a page, summarize it, write HTML, call a browser, run generated code, and store the result in memory. One unsafe boundary is enough.

How FutureAGI handles script injection

FutureAGI handles script injection as an execution-boundary problem. The specific surface from this glossary anchor is eval:CodeInjectionDetector: the CodeInjectionDetector security detector is used where text may become code, including generated scripts, template variables, browser-rendered HTML, plugin payloads, code-interpreter input, and tool arguments. For broader reporting, teams can roll those findings into InjectionSecurityScore by route, model, customer, and release.

A real workflow: a LangChain agent that builds internal troubleshooting pages is instrumented with traceAI-langchain. The trace captures the user request, retrieved support chunks, tool spans, and agent.trajectory.step before the page-builder tool runs. Before the HTML is saved or rendered, an Agent Command Center pre-guardrail sends the proposed markup and tool argument through CodeInjectionDetector. If the detector flags executable script, the route blocks the tool call, stores the evaluator result on the trace, and returns a fallback response asking for human review.

FutureAGI’s approach is to attach security evidence to the same trace the engineer debugs, not to treat code safety as a separate ticket queue. Compared with Semgrep rules that scan committed code, this catches generated script and tool payloads that never land in a repository. Compared with a plain OWASP LLM Top 10 checklist, it preserves the payload, source span, route, and detector result needed for incident review.

The next engineering action is concrete: quarantine the source chunk, escape or strip executable tags at the renderer, narrow the tool schema, add the payload to a regression dataset, and fail the release if CodeInjectionDetector findings exceed the approved threshold.

How to measure or detect it

Use layered signals because the payload can enter as text and execute later:

CodeInjectionDetector — detects code-injection vulnerabilities associated with CWE-94 in generated code, template output, and tool-facing payloads.
InjectionSecurityScore — aggregates injection-related findings so teams can compare route, release, or customer risk.
Trace boundary fields — inspect source URL, retrieved chunk id, tool.input, tool.output, rendered HTML, and agent.trajectory.step.
Dashboard signal — track detector-fail-rate-by-route, guardrail-block-rate, CSP violation count, tool retry count, and reviewed false-positive rate.
User-feedback proxy — watch reports of altered UI, unexpected redirects, pages opening new network calls, or agents running code the user did not request.

from fi.evals import CodeInjectionDetector

html = "<h1>Status</h1><script>fetch('/admin/export')</script>"
result = CodeInjectionDetector().evaluate(input=html)
print(result)

Alert on the boundary, not only the final page. A blocked script in a low-traffic admin route may matter more than a larger number of harmless markdown false positives in a public chat route. Slice by renderer, tool name, model, prompt version, and source corpus.

Common mistakes

Most script-injection bugs come from trusting model text after it has moved into a new runtime.

Escaping only chat UI output. The same text may later enter markdown previews, email templates, notebooks, reports, or admin dashboards.
Treating prompt injection and script injection as one bug. One attacks model instructions; the other reaches executable runtime surfaces.
Scanning checked-in code only. Generated HTML, tool payloads, and notebook cells may never touch the repository.
Allowing free-form tool arguments. A narrow schema prevents attacker text from becoming script, selector, URL, or code parameters.
Blocking without preserving evidence. Incident review needs source span, payload, detector result, renderer, route, and prompt version.