What is code injection in LLM systems?

Code injection in LLM systems happens when model-generated text, retrieved content, or tool arguments are treated as executable code without adequate validation. FutureAGI tracks this risk with `CodeInjectionDetector` and regression tests before release.

How is code injection different from prompt injection?

Prompt injection manipulates model instructions. Code injection is narrower and more dangerous at runtime: the manipulated content reaches an interpreter, template, plugin, or tool path as executable code.

How do you measure code injection?

Use FutureAGI's `CodeInjectionDetector` to detect CWE-94-style code execution risk, then track findings by prompt version, model, tool, and trace. Production teams also monitor sandbox-deny rate, unexpected tool executions, and eval-fail-rate-by-cohort.

What Is Code Injection (LLM)? Definition & FutureAGI Guide (2026)

What Is Code Injection (LLM)?

Code injection in LLM systems is a security failure where model output, retrieved content, or tool arguments are executed as code because validation, sandboxing, or tool boundaries failed. It is an AI security failure mode that appears in eval pipelines, production traces, code-generation agents, and RAG tools when untrusted text reaches interpreters, templates, plugins, or build steps. FutureAGI detects it with CodeInjectionDetector before rollout and monitors regressions after deployment.

Code injection detection flow across LLM generated code, tool calls, and runtime traces

Why code injection matters in production LLM and agent systems

Code injection turns an LLM’s language channel into a program channel. A support agent that writes Python snippets may pass a generated expression to eval. A data agent may splice a retrieved field into a Jinja, SQL, or shell-adjacent template. A code assistant may produce a patch that includes an unsafe dynamic import or executable string. The result is not just bad text; it can become unauthorized execution, secret exposure, lateral movement, or corrupted build artifacts.

The pain lands on different teams at different layers. Developers see tests that pass on normal prompts but fail under adversarial payloads. SREs see unusual tool-call frequency, unexpected subprocess spans, container restarts, or p99 latency spikes from injected loops. Security and compliance teams see missing audit evidence: nobody can explain which prompt, retrieved document, or tool result introduced the payload. End users see account takeover, data exfiltration, or a broken workflow.

Agentic systems make the risk sharper because one bad string can be copied across steps. A planner reads content, a tool builds a command, a worker executes code, and a summarizer hides the causal chain. Unlike a static Semgrep rule that inspects source text, LLM code injection often depends on runtime context, prompt history, retrieved content, and tool permissions.

How FutureAGI handles code injection

FutureAGI models code injection as both a dataset eval issue and a trace issue. The anchor eval:CodeInjectionDetector maps to CodeInjectionDetector, a security detector for code injection vulnerabilities (CWE-94). A practical workflow is to collect the prompt, retrieved context, generated code, tool arguments, and execution result into an eval dataset. The engineer runs CodeInjectionDetector against generated snippets and serialized tool payloads, then compares eval-fail-rate-by-cohort by prompt version, model, tool, and route.

In production, the same case should be visible in traces. With traceAI-langchain, the team attaches spans around retrievers, code-generation tools, sandboxes, and tool-call execution. The workflow records the trace id next to each eval row, along with tool name, generated-code hash, sandbox policy id, user or tenant cohort, and model route. If the detector flags a sample, the owner can answer whether the payload arrived from a user message, a retrieved document, a memory record, or a model rewrite.

FutureAGI’s approach is to separate detection from response. Detection produces evidence and cohorts; response belongs in Agent Command Center as a pre-guardrail, post-guardrail, model fallback, or route block. A high-risk cohort can trigger an alert, quarantine the sample into a regression eval, and require approval before the vulnerable prompt or tool schema ships again.

How to measure or detect code injection

Use several signals together; a single regex cannot cover runtime code paths, model rewrites, and agent tool chains.

CodeInjectionDetector findings — detects CWE-94 code injection vulnerabilities in generated or assembled code; track finding rate by model, prompt, and tool.
Trace correlation — capture trace id, tool name, code-generation span, sandbox execution span, and route; traceAI-langchain helps with LangChain agents.
Dashboard signals — watch eval-fail-rate-by-cohort, unexpected tool-execution count, sandbox-deny rate, and p99 latency jumps after prompt changes.
Feedback proxies — monitor security escalations, source-control review comments, and user reports of unrequested file, workflow, or account changes.

from fi.evals import CodeInjectionDetector

detector = CodeInjectionDetector()
# Evaluate generated snippets, tool arguments, and code patches.
# Fail the release gate when a CWE-94 finding appears.

Common mistakes

Most failures come from testing the visible answer while ignoring the execution boundary. Good teams also miss provenance: they know a payload is present, but not whether the source was user input, RAG context, session memory, or a model rewrite. Without that split, every mitigation becomes a broad blocklist.

Scanning prompts but not retrieved documents; indirect payloads often arrive from tickets, pages, comments, or memory records.
Checking final answers only; the dangerous string may appear in a hidden tool argument or temporary code file.
Allowing eval, dynamic imports, or template execution because a sandbox exists; policy boundaries still need negative tests.
Mixing code-generation quality evals with security evals; a syntactically correct patch can still create CWE-94 exposure.
Using one threshold for every tool; code executors, browser automation, and file writers need different failure budgets.