What Is a Link Injection Data Privacy Attack?
An exfiltration attack that tricks an LLM into rendering a link or markdown image whose URL embeds sensitive conversation data, leaking it via client auto-fetch.
What Is a Link Injection Data Privacy Attack?
A link injection data privacy attack uses an LLM as a side-channel to exfiltrate conversation data. The attacker plants instructions — usually inside a document the LLM retrieves — that tell the model to render a markdown image or link of the form . When the chat client renders the response, the embedded image URL auto-fetches, and the attacker’s server receives the data appended to the URL. The model’s text answer might look harmless. The harm is in the network request the rendered output triggers. It is the privacy-leak cousin of indirect prompt injection.
Why It Matters in Production LLM and Agent Systems
Through early 2026 multiple chat-based LLM products have been shown vulnerable to this class. Researchers demonstrated extracting user emails, API keys pasted earlier in the conversation, and document contents through markdown image references injected via retrieved documents and tool outputs. The pattern has shipped to production multiple times because the model’s response to a “harmless” query is itself a benign-looking link — only the rendering client, not the eval pipeline, sees the data leak.
The pain falls hardest on RAG and agent products that embed external content. A customer-support chatbot retrieves a help article that an attacker uploaded; the article instructs the model to summarise the user’s prior messages and append them to a tracking URL. A coding assistant fetches an npm package README; the README instructs the model to embed the developer’s environment variables in a status-image URL. A research agent browses a poisoned web page; the page tells the agent to encode its memory contents into an image href.
Symptoms in logs are subtle: a sudden spike in URLs containing query parameters, response patterns that include image markdown the user did not request, and outbound requests from the chat client to domains never seen before. Most current eval suites do not look for this — they grade the text, not the network shape it implies.
How FutureAGI Handles Link Injection
FutureAGI’s approach is to score both inputs (retrieved content) and outputs (model responses) for injection patterns, and to enforce sanitisation at the response layer. fi.evals.PromptInjection runs against retrieved documents and tool outputs as a pre-guardrail, flagging content that contains markdown-link instructions or steganographic prompts. ProtectFlash runs at the user-input layer for direct cases. On the response side, a post-guardrail scans the model’s output for: outbound URLs whose parameters look like base64 or query-encoded sensitive data, image references to domains not on an allow-list, and markdown structures that suggest exfiltration intent. PII evaluator runs in parallel — if the response embeds detected PII inside a URL, the post-guardrail blocks or rewrites.
Concretely: a RAG team configures the Agent Command Center with pre-guardrail: PromptInjection on retrieved-context spans (traceAI-langchain tags every retrieved chunk), and post-guardrail: PII + ContentSafety on the response. URLs in the response are passed through a sanitiser that strips query parameters and rewrites image references to a domain allow-list. Weekly red-teaming via simulate-sdk’s Persona includes “indirect injection via poisoned document” personas — a test cohort whose pass rate is a release gate.
Unlike pure content-safety filters, link-injection defence requires output structure analysis (URL shape, markdown nesting), not just text classification.
How to Measure or Detect It
Detection signals span input, output, and client layers:
fi.evals.PromptInjection: scores retrieved-context spans for injection signatures.fi.evals.ProtectFlash: low-latency input-sidepre-guardrail.fi.evals.PII: detects sensitive content embedded in any response field, including URLs.- OTel attribute
llm.output.urls: list of URLs emitted in each response — query-able for unusual destinations. - Outbound-URL allow-list violation rate: counts responses with URLs outside an approved domain list.
- Persona red-team via simulate-sdk: synthetic poisoned-document personas run weekly.
from fi.evals import PromptInjection, PII
injection = PromptInjection()
pii = PII()
result_input = injection.evaluate(
input=retrieved_document_chunk
)
result_output = pii.evaluate(
output=model_response_markdown
)
print(result_input.score, result_output.score)
Common Mistakes
- Scoring only user input. Link injection lives in retrieved content; the user prompt looks fine.
- Ignoring markdown structure. A plain-text scanner sees a URL; the attack lives in the image syntax that triggers auto-fetch.
- No outbound-URL allow-list on the chat client. Even with model-side defence, a client that auto-fetches arbitrary domains is the actual leak surface.
- Treating it as a generic prompt-injection problem. The mitigation is output-shape analysis plus allow-listing, not just input filtering.
- Skipping retrieved-content scoring. RAG pipelines that score only the final response will miss every indirect-injection variant.
Frequently Asked Questions
What is a link injection data privacy attack?
It is an exfiltration technique where an attacker — usually via indirect prompt injection in retrieved content — tricks the LLM into emitting a markdown link or image whose URL encodes sensitive conversation data. The user's client auto-fetches the URL, leaking data to the attacker's server.
How is link injection different from regular prompt injection?
Prompt injection makes the model do something the operator did not intend. Link injection is a specific exfiltration variant: the attacker does not care about the model's text answer, only about the side-channel created by a rendered URL the client will fetch.
How do you defend against it?
Sanitize LLM markdown output to strip or rewrite suspicious URLs, run PromptInjection on retrieved content, block external image references in chat clients, and use post-guardrails that scan responses for PII embedded in URL parameters.