Prompt Injection: Defending LLM-Powered Apps

Why input filtering is not enough, and the architectural patterns that actually contain untrusted model output.

Prompt injection is not a weird prompt problem. It is a data-flow problem where untrusted text reaches a component with authority to act.

Separate instructions from data

Applications should preserve a hard distinction between developer instructions, user requests, retrieved content, and tool results. The model may see all of them, but the application should not give all of them equal authority.

message-policy.ts

type MessageSource = "developer" | "user" | "retrieval" | "tool";

export function canRequestTool(source: MessageSource) {
  return source === "developer" || source === "user";
}

Use tool sandboxes

Tools should be scoped by design. A model that summarizes a document should not also be able to send email, update billing, or delete a repository without an explicit user confirmation.

WARNING

Do not treat model confidence as authorization. The application owns permission checks, audit logs, and confirmation prompts.

mid article - 300x250 responsive

Contain retrieved text

Retrieved documents are attacker-controlled if an attacker can influence your index. Treat search results like web input: quote them, cite them, and keep them away from privileged instructions.

Design for refusal paths

The safest agent workflows include boring stops. When a tool call is high impact or ambiguous, ask for confirmation and show the exact action before executing it.

ConfirmAction.tsx

export function ConfirmAction({ command }: { command: string }) {
  return (
    <form>
      <p>Review before execution:</p>
      <code>{command}</code>
      <button type="submit">Approve</button>
    </form>
  );
}

Strong LLM security is mostly ordinary application security: least privilege, clear boundaries, careful logging, and skeptical treatment of untrusted data.

Threat model the agent loop

Every LLM application has a loop: input enters, context is assembled, the model produces output, and the application may call tools. Prompt injection becomes dangerous when untrusted text influences a privileged step in that loop. Draw the loop and mark which data sources are trusted, untrusted, or mixed.

High-risk sources:

Retrieved web pages and documents.
Customer-uploaded files.
Tool output from external systems.
Prior conversation history.
Emails, tickets, and chat messages imported into the prompt.

Practical controls

Use allowlisted tools, scoped credentials, confirmation for high-impact actions, and structured outputs with validation. The model can propose an action, but the application should decide whether that action is authorized.

tool-policy.ts

const highImpactTools = new Set(["send_email", "delete_repo", "create_invoice"]);

export function requiresApproval(toolName: string) {
  return highImpactTools.has(toolName);
}

Monitoring and incident response

Log tool proposals, tool approvals, rejected actions, retrieval sources, and policy decisions. If a user reports unexpected behavior, you need to reconstruct whether the model saw hostile content and whether the application blocked or allowed the resulting action.

TIP

Prompt injection defense is strongest when model output is treated like a request, not a command.

Red-team prompts to test

Test the application with hostile retrieved content, not just hostile user chat. Put instructions inside documents, tickets, web pages, and tool outputs. Ask the model to ignore prior rules, exfiltrate hidden context, or call a tool for a reason that sounds helpful. Then verify whether policy code blocks the action.

Data boundaries

Separate secrets, system prompts, retrieved context, and user-visible output. Do not put API keys, raw credentials, or sensitive internal policies into context unless the model absolutely needs them. If the model never sees a secret, prompt injection cannot reveal it.

Product design choices

Make high-risk actions reviewable. Show the exact email, command, repository change, or transaction before execution. Users should approve the final action, not a vague summary. This protects both the user and the product when model behavior is uncertain.

Evaluation strategy

Prompt injection defenses should be tested continuously because the application changes faster than any static prompt. Build an evaluation set with hostile documents, hostile user messages, poisoned tool outputs, and confusing multi-step tasks. Run it whenever prompts, tools, retrieval sources, or approval flows change.

Useful evaluation outcomes:

The model identifies untrusted instructions inside retrieved content.
High-impact tools require explicit approval.
Secrets are never included in user-visible output.
Tool arguments are validated against application policy.
Logs capture the reason an action was blocked.

What security teams should own

Security should not own every prompt, but it should own the control model: tool permissions, audit logging, data classification, approval thresholds, and incident response. Product teams can then move quickly inside guardrails that are understandable and testable.

The healthiest LLM apps make trust boundaries visible in code. Instead of relying on a magic system prompt, they use normal engineering controls: typed tool schemas, allowlists, scoped credentials, explicit confirmations, and telemetry that explains why an action happened.