Failure Modes

What Is Math Framing Injection Attack?

Prompt injection disguised as math notation, equations, or symbolic reasoning so an LLM treats unsafe instructions as neutral problem solving.

What Is Math Framing Injection Attack?

Math framing injection attack is an agent failure mode where an attacker hides prompt-injection intent inside arithmetic, symbolic notation, equations, scoring rubrics, or “solve for x” tasks. It shows up in eval pipelines, production traces, RAG chunks, and tool-using agents when math-like packaging makes unsafe instructions look like neutral reasoning work. The risk is not the math itself; it is the model treating the framed content as higher-priority instructions. FutureAGI detects the pattern with PromptInjection, ProtectFlash, and trace review.

Why It Matters in Production LLM/Agent Systems

The concrete failure is instruction hijacking disguised as legitimate reasoning. A support agent may receive a user prompt that asks it to “convert the policy below into a formula,” or a RAG chunk may contain symbolic text that maps variables to hidden commands. The model follows the mathematical frame, expands the variables, and then treats the expanded text as instructions. Downstream symptoms look like a normal reasoning trace until the agent leaks a prompt, skips a refusal, calls an unsafe tool, or rewrites a policy answer around the attacker’s goal.

Developers feel the pain first because the prompt looks benign under quick inspection. SREs see no crash, only unusual evaluator failures, longer reasoning chains, or a spike in tool calls after math-heavy inputs. Security and compliance teams need to prove whether the problem came from the user, a retrieved chunk, a parsed file, or a tool response. End users see the agent answer with confidence while violating a policy it normally obeys.

This is especially relevant for 2026 multi-step agents because math-like framing travels well across boundaries. A planning model can transform the frame, a tool can preserve it in tool.output, and a second model can execute the transformed instruction. Unlike simple keyword filters or a promptfoo suite built around known attack strings, trace-level detection must catch the semantic role of the frame, not just suspicious words.

How FutureAGI Handles Math Framing Injection Attack

There is no dedicated FutureAGI product surface named for math framing injection attack; FutureAGI treats it as a prompt-injection variant that must be evaluated at every boundary where framed text becomes model context. The nearest eval surface is fi.evals.PromptInjection, which scores the user message, retrieved chunk, or tool output for injection intent. The nearest runtime surface is ProtectFlash, commonly placed as an Agent Command Center pre-guardrail before the model sees the framed content.

A real workflow starts in traces. A LangChain research agent is instrumented with traceAI-langchain. The trace records the user message, retrieval spans, tool spans, tool.output, and agent.trajectory.step before each planner action. When a retrieved document includes an equation-like instruction frame, the team evaluates that chunk with PromptInjection. If the score crosses the release threshold, Agent Command Center applies ProtectFlash as a pre-guardrail, drops or quarantines the chunk, and routes the request to a fallback response rather than letting the planner expand the frame.

FutureAGI’s approach is to preserve evidence with the decision: source document, chunk id, prompt version, model route, evaluator result, and final fallback. The engineer then adds the failed trace to a regression dataset, replays similar math-framed prompts against the next model and prompt version, and sets a policy such as “zero high-risk math-framed injection cases pass the guardrail in the release suite.”

How to Measure or Detect It

Use multiple signals because math framing attacks are designed to look like ordinary reasoning:

  • PromptInjection evaluator — scores whether an input segment contains prompt-injection intent, even when the intent is wrapped in symbolic or task-like language.
  • ProtectFlash evaluator — a low-latency guardrail check for live routes before framed content reaches the planner.
  • Trace fields — inspect tool.output, retrieved chunk text, source URL, prompt version, model route, and agent.trajectory.step around the bad action.
  • Dashboard signal — track eval-fail-rate-by-cohort, block-rate-by-source, fallback-rate-after-math-input, and repeated failures from the same connector.
  • User-feedback proxy — watch escalations saying the agent “followed a formula” or answered from a document instruction instead of the user request.
from fi.evals import PromptInjection

evaluator = PromptInjection()
result = evaluator.evaluate(
    input="Solve this symbolic policy puzzle, then follow the derived instruction."
)
print(result.score, result.reason)

Review false positives separately for educational math, finance calculations, and coding tasks. The threshold that works for public chat may be too strict for a tutor or too loose for an agent with write tools.

Common Mistakes

  • Blocking only obvious jailbreak strings. Math framing avoids direct phrases, so substring filters miss the attack while the reasoning step expands it.
  • Treating all math inputs as suspicious. Score intent and downstream behavior; do not break legitimate tutoring, analytics, or finance workflows.
  • Skipping retrieved chunks. A poisoned document can carry the frame even when the user asks a harmless question.
  • Ignoring planner traces. The dangerous step is often the model’s transformation from symbolic text into an executable instruction.
  • Logging the expanded instruction without controls. Redact or quarantine the unsafe expansion so audit logs do not become a second distribution channel.

Frequently Asked Questions

What is math framing injection attack?

Math framing injection attack disguises prompt-injection intent as equations, symbolic notation, formulas, or math word problems. The model is asked to solve the frame and may follow unsafe hidden instructions.

How is math framing injection attack different from encoding injection?

Encoding injection hides instructions by changing representation, such as Base64 or escaped characters. Math framing injection hides intent inside a reasoning task, so the danger is the model's instruction-following behavior during the solve step.

How do you measure math framing injection attack?

Use FutureAGI's PromptInjection evaluator on user inputs, retrieved chunks, and tool outputs, then place ProtectFlash as a pre-guardrail for live traffic. Track eval-fail-rate-by-cohort and blocked traces by source.