Skip to content

LLM09: Misinformation — Hallucinations as a Security Risk

In June 2023, lawyers Roberto Mata and Peter LoDuca submitted a brief in Mata v. Avianca that cited six precedent cases supporting their argument. When opposing counsel could not locate the cases, the judge ordered the lawyers to produce copies. The cases did not exist. The attorneys had used ChatGPT to research case law, and ChatGPT had generated plausible-sounding but entirely fabricated citations — complete with realistic case names, court identifiers, and page numbers. The lawyers were sanctioned $5,000 and publicly reprimanded. The incident established a widely-discussed precedent for the practical risk of treating LLM output as factual in security-relevant contexts. In the same year, security researchers documented a related phenomenon called “slopsquatting”: LLMs hallucinating non-existent Python package names in code generation output. When those package names are registered on PyPI by opportunistic attackers, users who run the LLM-suggested pip install command install malicious software. The attack requires no exploitation of a vulnerability — only that the LLM confidently names a package that doesn’t exist, and that someone installs it.

OWASP LLM09 describes the risk that LLMs generate plausible but false information that is then used as the basis for decisions in security-sensitive contexts. Unlike the other LLM Top 10 risks, misinformation does not require an attacker. The LLM itself is the source of the incorrect information, and the harm comes from over-reliance: treating generated text as factual because it is fluent, well-formatted, and confident.

The security dimension of misinformation is distinct from the general accuracy problem:

Hallucinated package names (slopsquatting). When a developer asks an LLM to suggest dependencies for a Python project, the LLM may name packages that don’t exist on PyPI. If an attacker registers those names with malicious code, the next developer who follows the suggestion installs malware. Research from 2024 found that popular LLMs hallucinate package names in a measurable percentage of code generation responses, creating a systematic supply chain attack surface.

Hallucinated CVE details. LLMs queried about specific CVE numbers may generate plausible-sounding vulnerability descriptions, affected version ranges, and mitigations for CVEs that either don’t exist or have different characteristics. A security engineer who acts on hallucinated CVE information — applying a patch to the wrong version, closing a ticket prematurely — can leave a real vulnerability unfixed.

Hallucinated security guidance. LLMs queried about security practices may produce confident guidance that is outdated, incorrect, or directly harmful. “Is it safe to use MD5 for password hashing?” asked to a model that hallucinates affirmative answers creates real risk if the response is used without verification.

Misinformation in threat assessments. In automated security workflows — log analysis, alert triage, threat modeling — LLM-generated summaries that contain fabricated details can lead to incorrect incident response decisions.

The exploit: slopsquatting in code generation

Section titled “The exploit: slopsquatting in code generation”
# A developer asks an LLM for help setting up a Python project:
#
# User: "What packages do I need for a Python project that does JWT auth,
# rate limiting, and structured logging?"
#
# LLM response (illustrative hallucination):
# "Install the following: flask, pyjwt, flask-limiter, structlog,
# flask-request-validator, py-ratelimiter, loguru-structured"
#
# "py-ratelimiter" and "flask-request-validator" may not exist on PyPI.
# An attacker who registers these names publishes malicious packages.
# VULNERABLE: blindly installing LLM-suggested packages
import subprocess, sys
def install_llm_suggested_packages(package_list: list[str]) -> None:
for pkg in package_list:
# VULNERABLE: no existence or integrity check before install
subprocess.run(
[sys.executable, "-m", "pip", "install", pkg],
check=True,
)
# A developer runs this with the LLM's package list, installing a malicious package.

The exploit: hallucinated CVE in automated triage

Section titled “The exploit: hallucinated CVE in automated triage”
# VULNERABLE: automated security triage that trusts LLM CVE analysis
import openai
client = openai.OpenAI()
def triage_cve(cve_id: str) -> dict:
# VULNERABLE: LLM used as authoritative source for CVE details
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": f"Describe the vulnerability for {cve_id}: affected versions, "
f"severity, and recommended mitigation.",
}
],
max_tokens=500,
)
# VULNERABLE: treating LLM output as ground truth for security decisions
return {
"cve": cve_id,
"analysis": response.choices[0].message.content,
"source": "llm", # VULNERABLE: no authoritative source verification
"action": "auto-close if LLM says low severity", # VULNERABLE: automated action on LLM claim
}

M1: Ground outputs with Retrieval-Augmented Generation from authoritative sources

Section titled “M1: Ground outputs with Retrieval-Augmented Generation from authoritative sources”

For factual queries — CVE details, package documentation, security standards — always retrieve from a verified, authoritative source and cite it:

import httpx
from openai import OpenAI
client = OpenAI()
def get_cve_details_grounded(cve_id: str) -> dict:
# SAFE: fetch authoritative data from NVD first
nvd_url = f"https://services.nvd.nist.gov/rest/json/cves/2.0?cveId={cve_id}"
response = httpx.get(nvd_url, timeout=10.0)
response.raise_for_status()
nvd_data = response.json()
vulnerabilities = nvd_data.get("vulnerabilities", [])
if not vulnerabilities:
return {"cve": cve_id, "found": False, "source": "nvd"}
cve_item = vulnerabilities[0]["cve"]
description = cve_item["descriptions"][0]["value"]
cvss_score = (
cve_item.get("metrics", {})
.get("cvssMetricV31", [{}])[0]
.get("cvssData", {})
.get("baseScore", "N/A")
)
# SAFE: LLM used only for summarizing verified data, not as primary source
summary_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": f"Summarize this CVE description for an engineer in 2 sentences:\n{description}",
}
],
max_tokens=150,
)
return {
"cve": cve_id,
"found": True,
"cvss_score": cvss_score,
"description": description, # SAFE: authoritative NVD text
"summary": summary_response.choices[0].message.content, # LLM summary only
"source": "nvd.nist.gov", # SAFE: cite the source
}

M2: Validate package names before installation

Section titled “M2: Validate package names before installation”

Before installing any LLM-suggested package, verify its existence on PyPI and check for known vulnerabilities:

import httpx
import subprocess
import sys
def verify_package_exists(package_name: str) -> bool:
"""Check if a package exists on PyPI before installing it."""
# SAFE: verify existence via PyPI JSON API
response = httpx.get(
f"https://pypi.org/pypi/{package_name}/json",
timeout=5.0,
)
return response.status_code == 200
def safe_install_suggested_package(package_name: str) -> dict:
# SAFE: existence check before install
if not verify_package_exists(package_name):
return {
"package": package_name,
"status": "not_found",
"action": "skipped — package does not exist on PyPI",
}
# SAFE: run pip-audit check before installing
audit_result = subprocess.run(
["pip-audit", "--requirement", f"/dev/stdin"],
input=f"{package_name}\n".encode(),
capture_output=True,
)
if audit_result.returncode != 0:
return {
"package": package_name,
"status": "vulnerable",
"action": "skipped — known vulnerabilities found",
"details": audit_result.stdout.decode(),
}
subprocess.run([sys.executable, "-m", "pip", "install", package_name], check=True)
return {"package": package_name, "status": "installed"}

M3: Return citations and confidence signals alongside LLM responses

Section titled “M3: Return citations and confidence signals alongside LLM responses”

Design your application to surface the sources behind LLM-generated claims, and make uncertainty explicit:

from pydantic import BaseModel
from openai import OpenAI
from typing import Optional
client = OpenAI()
class GroundedResponse(BaseModel):
answer: str
confidence: str # "high" | "medium" | "low"
requires_verification: bool
suggested_source: Optional[str] = None
def answer_security_question(question: str) -> GroundedResponse:
# SAFE: structured output with explicit confidence and verification flag
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"You are a security information assistant. "
"For questions about specific CVEs, packages, or version numbers, "
"always set requires_verification=true and suggest an authoritative source. "
"Set confidence='low' for any time-sensitive or specific factual claim."
),
},
{"role": "user", "content": question},
],
response_format=GroundedResponse,
max_tokens=400,
)
result = response.choices[0].message.parsed
# SAFE: flag responses that need verification before action
if result.requires_verification:
print(f"⚠ Verify before acting: {result.suggested_source or 'authoritative source required'}")
return result

M4: Human review gates for security-relevant automated decisions

Section titled “M4: Human review gates for security-relevant automated decisions”

Never automate security-relevant decisions — alert closures, patch approvals, dependency updates — based solely on LLM output:

from enum import Enum
from dataclasses import dataclass
class TriageAction(str, Enum):
ESCALATE = "escalate"
INVESTIGATE = "investigate"
CLOSE = "close"
@dataclass
class TriageResult:
alert_id: str
llm_analysis: str
llm_suggested_action: TriageAction
human_approved: bool = False # SAFE: requires explicit human approval
def triage_security_alert(alert_id: str, alert_data: dict) -> TriageResult:
# ... LLM analysis ...
suggested = TriageAction.INVESTIGATE # from LLM
# SAFE: LLM suggestion recorded but not automatically acted upon
result = TriageResult(
alert_id=alert_id,
llm_analysis="[LLM generated analysis]",
llm_suggested_action=suggested,
human_approved=False, # SAFE: requires human review before any action
)
return result
def execute_triage_action(result: TriageResult) -> None:
if not result.human_approved: # SAFE: gate on human approval
raise RuntimeError("Triage action requires human approval before execution.")
# ... execute action ...

LLM09 is primarily a runtime risk — whether an LLM hallucinates cannot be determined by inspecting Python source code. LLMArmor does not scan for misinformation risk.

However, LLMArmor can detect code patterns that make misinformation more likely to cause harm: LLM output used directly in security decisions without a human approval gate (LLM08 overlap), and missing max_tokens limits that enable unexpectedly verbose hallucinations (LLM10 overlap).

For misinformation-specific tooling:

  • Garak — probes for hallucination, factuality, and sycophancy behaviors using adversarial prompts
  • Promptfoo — test suites for factual accuracy against known-answer benchmarks
  • Ragas — RAG evaluation framework with faithfulness and answer relevancy metrics
Terminal window
pip install llmarmor
llmarmor scan ./src
What is slopsquatting?
Slopsquatting is a supply chain attack that exploits LLM hallucinations in code generation. When an LLM generates code that references a Python (or npm, etc.) package that doesn't exist, an attacker can register that package name on the public repository with malicious code. Any developer who follows the LLM's suggestion and installs the package installs the attacker's code. The term was coined by security researchers studying the prevalence of hallucinated package names in LLM-generated Python code.
How is LLM hallucination a security risk rather than just an accuracy problem?
Hallucination becomes a security risk when the output influences a security-relevant decision: installing a suggested package, applying a patch based on a described CVE, closing a security alert based on an LLM's assessment, or following security guidance from an LLM without verification. In each case, acting on hallucinated information without verification can leave a real vulnerability unaddressed or introduce a new one.
What is the Mata v. Avianca case and why does it matter for LLM security?
In 2023, lawyers in Mata v. Avianca submitted a legal brief citing six precedent cases generated by ChatGPT that did not exist. The judge ordered the lawyers to produce copies of the cases, which they could not do. The lawyers were sanctioned for failing to verify LLM output before filing. The case is frequently cited as a canonical example of the harm caused by treating LLM-generated factual claims as authoritative without independent verification.
Does RAG eliminate hallucinations?
RAG reduces hallucinations for questions where the answer appears in the retrieved context — the model has a factual grounding to cite. But RAG does not eliminate hallucinations: the model can still ignore, misinterpret, or extrapolate beyond retrieved content. For security-critical facts, always cite the source and require human verification for decisions, even when using RAG.
How can I test my LLM application for misinformation risk?
Use an evaluation framework like Promptfoo or Ragas with a test suite of questions whose correct answers you know. Measure factual accuracy across repeated runs (LLMs are non-deterministic — a question answered correctly once may be answered incorrectly later). For package-name hallucination, generate a set of dependency suggestions and check each name against PyPI programmatically. Use Garak for adversarial probing of hallucination and sycophancy behaviors.
Is LLM09 covered by LLMArmor?
No. Misinformation is a runtime behavioral property of the model, not a structural code pattern. LLMArmor cannot determine whether an LLM will hallucinate. It can detect code patterns that make hallucination more harmful — such as automated security decisions based on LLM output without a human gate — but those detections are reported under LLM08. For misinformation-specific coverage, use evaluation tools like Garak, Promptfoo, or Ragas.