Prompt Injection
An attack that smuggles malicious instructions into a model's context, hijacking its behavior.
Prompt injection is the most prevalent vulnerability class in LLM-powered apps. An attacker plants instructions in content the model later reads a webpage, an email, a database row and the model follows those instructions instead of (or in addition to) the system prompt.
Indirect prompt injection (via retrieved content) is harder to defend against than direct injection (via user input). Defenses include input sanitization, output validation, instruction hierarchy, sandboxing of tool calls, and never giving an LLM more authority than it actually needs.
No model is fully prompt-injection-proof in 2026. Building safe LLM apps means assuming the model will sometimes follow attacker instructions and designing the surrounding system so the blast radius is small.