AI Definition

Prompt Injection

An attack that smuggles malicious instructions into a model's context, hijacking its behavior.

Prompt injection is the most prevalent vulnerability class in LLM-powered apps. An attacker plants instructions in content the model later reads a webpage, an email, a database row and the model follows those instructions instead of (or in addition to) the system prompt.

Indirect prompt injection (via retrieved content) is harder to defend against than direct injection (via user input). Defenses include input sanitization, output validation, instruction hierarchy, sandboxing of tool calls, and never giving an LLM more authority than it actually needs.

No model is fully prompt-injection-proof in 2026. Building safe LLM apps means assuming the model will sometimes follow attacker instructions and designing the surrounding system so the blast radius is small.

Related concepts

Guardrails

Programmatic checks around an LLM that prevent unsafe, off-topic, or non-compliant outputs.

Evals (AI Evaluation)

Automated tests that measure how well an AI system performs on representative inputs.

LLM (Large Language Model)

A neural network trained on huge amounts of text to predict and generate language.

Want help applying this in production?

Our engineers ship AI features into production every week. Tell us what you're building.

Get a Free Quote Contact Us