What Is Link Injection?
An LLM attack that inserts unauthorized or misleading URLs into model outputs, citations, messages, or agent actions.
What Is Link Injection?
Link injection is an LLM security failure where a model or agent inserts unauthorized, misleading, or attacker-controlled URLs into generated answers, citations, emails, tickets, or tool calls. It appears in eval pipelines, production traces, and gateway guardrails when user input, retrieved content, or tool output steers the model toward a link the application did not approve. FutureAGI measures link behavior with ContainsValidLink, then pairs trace review with PromptInjection or ProtectFlash for hostile-link vectors.
Why Link Injection Matters in Production LLM and Agent Systems
Link injection turns generated text into a delivery channel. The visible response may look helpful, while the actual link sends a user to a phishing page, a credential-harvesting form, an attacker-controlled “documentation” site, or a URL that encodes private context in query parameters. The named failure modes are phishing-link insertion, citation hijacking, and exfiltration through URLs.
Developers feel it when output tests pass but screenshots or Markdown renderers hide the real href. SREs see abnormal outbound-link counts, higher redirect rates, or a sudden cluster of valid links from domains never seen in the baseline. Security teams need to know whether the URL came from user input, a retrieved chunk, a tool output, or a model hallucination. Product teams hear the incident after a user clicks.
This is sharper for 2026-era agentic systems because agents do not only answer. They draft emails, update tickets, post summaries, open browser sessions, and call tools whose outputs become the next prompt. One poisoned help article can make a support agent cite the wrong reset link across thousands of replies. One malicious tool output can put a tracking URL into a customer-facing ticket. If the trace cannot separate approved source links from generated links, teams lose the evidence needed to block the route and replay the incident.
How FutureAGI Handles Link Injection
FutureAGI anchors link-injection review to the specific surface eval:ContainsValidLink. The ContainsValidLink evaluator checks whether generated text contains a link that returns a 2xx status code. That is a narrow but useful contract: if the product promised a valid docs link, the eval can catch missing, malformed, or dead links. It does not prove the link is safe, approved, or policy-compliant, so engineers pair it with trace fields and guardrails.
A real workflow: a LangChain support copilot is instrumented with traceAI-langchain. The trace records tool.output, retrieved source URL, rendered Markdown, final message.content, route, and guardrail verdict. A retrieved FAQ includes a hostile instruction telling the model to replace the billing link with https://login-example.invalid/reset. In the eval run, ContainsValidLink marks whether the emitted URL resolves. In production, Agent Command Center applies ProtectFlash as a pre-guardrail on retrieved text and a post-guardrail that blocks unapproved domains before the email or ticket is sent.
FutureAGI’s approach is to separate link availability from link authorization. Unlike a Google Safe Browsing lookup or a Lakera Guard check placed only at the user-input boundary, this workflow also inspects the generated output, the source chunk, and the route decision. The engineer’s next step is concrete: alert on valid-but-unapproved links, add the trace to a regression dataset, tighten the domain policy, and re-run PromptInjection plus ContainsValidLink before the prompt or retrieval policy ships again.
How to Measure or Detect Link Injection
Use multiple signals because a live URL can still be hostile:
ContainsValidLink— checks whether text contains a link that resolves with a 2xx status code; use it for valid-link contracts, not domain trust.PromptInjectionandProtectFlash— flag instructions around retrieved text, tool output, or user input that attempt to force a replacement link.- Trace fields — inspect
tool.output, source URL, rendered Markdown text, finalhref, route, guardrail verdict, and approver. - Dashboard signals — track unapproved-domain rate, redirect-rate-by-domain, valid-link-fail-rate, and post-guardrail-block-rate.
- User-feedback proxy — monitor reports about suspicious links, wrong domains, broken reset links, and unexpected redirects.
from fi.evals import ContainsValidLink
evaluator = ContainsValidLink()
result = evaluator.evaluate(
input="Read the setup guide: https://example.com"
)
print(result.score, result.reason)
Alert on changes by route and source type. A low global rate can hide one poisoned connector, one prompt version, or one customer workspace with a compromised knowledge base.
Common Mistakes
Most incidents come from confusing link availability with link authorization.
- Treating HTTP 200 as safe. A live attacker page can resolve cleanly while still stealing credentials or misleading users.
- Checking only user prompts. Retrieved documents, email bodies, browser pages, and tool outputs can inject replacement links after input filtering.
- Trusting Markdown display text. Log and compare the visible anchor text, raw
href, expanded redirect target, and source span separately. - Using one allowlist for every workflow. Documentation, login, payment, file-download, and support-ticket links need different approved domains.
- Skipping regression after a blocked link. Add the source chunk, rendered output, and guardrail verdict to an eval dataset before editing prompts.
Frequently Asked Questions
What is link injection?
Link injection is an LLM security failure where a model or agent inserts unauthorized, misleading, or attacker-controlled URLs into generated answers, citations, emails, tickets, or tool calls.
How is link injection different from prompt injection?
Prompt injection changes the model's instructions. Link injection is the output or action-level result where the system emits an unapproved URL, often because prompt injection or poisoned retrieval steered it there.
How do you measure link injection?
Use FutureAGI's ContainsValidLink to verify expected generated links return a 2xx status, then track unapproved-domain rate, PromptInjection or ProtectFlash flags, and guardrail blocks in traces.