Skip to content

LLM Security Best Practices: A Production Checklist

In April 2023, Samsung Electronics discovered that employees in its semiconductor division had pasted proprietary source code and internal meeting notes into ChatGPT while asking it for help debugging and summarization. The data was transmitted to OpenAI’s servers and — under the terms of service at the time — could be used to improve future model versions. Samsung banned internal use of generative AI tools within weeks. The incident required no technical exploit: no injections, no API key theft, no novel vulnerabilities. Engineers simply used the most convenient tool available, and corporate data left the building. Production LLM security is not only about stopping attackers. It is about building systems where doing the right thing is also the easy thing.

Before writing a single line of security code, map your threat model. LLM applications have a different attack surface than traditional web apps:

  • Prompt injection (LLM01): Users or upstream content cause the LLM to override its instructions.
  • Sensitive data exposure (LLM02): API keys, PII, or trade secrets enter prompts and are transmitted to third-party model providers, stored in logs, or extracted by attackers.
  • Insecure output handling (LLM05): LLM output reaches dangerous sinks — eval(), SQL queries, shell commands, rendered HTML — enabling secondary injection attacks.
  • Excessive agency (LLM08): Agents with over-broad tool access are hijacked via prompt injection to perform unintended actions.
  • Unbounded consumption (LLM10): Missing rate limits allow cost-based denial-of-service attacks that drain API budgets.
  • Insider data leakage: Legitimate users (employees, customers) send proprietary data to third-party LLM providers without realizing it.

Input Threats

Prompt injection, jailbreaks, and data leakage via user queries. Mitigate with validation, sanitization, and data classification.

Output Threats

Hallucinations reaching security decisions, LLM output in SQL/eval/HTML. Mitigate with output validation and encoding.

Runtime Threats

Agent tool abuse, excessive API costs, behavioral drift. Mitigate with guardrails, limits, and anomaly detection.

Compliance Risks

PII in logs, cross-border data transfer, GDPR/HIPAA exposure. Mitigate with data governance and provider audit.

Every input that reaches an LLM should pass through a validation layer that enforces type, length, character set, and allowlist constraints before the model call.

import re
import unicodedata
from pydantic import BaseModel, Field, field_validator
from typing import Literal
ALLOWED_PERSONAS = frozenset({"support", "sales", "technical", "onboarding"})
class LLMRequest(BaseModel):
"""Validated, sanitized request to the LLM layer."""
user_message: str = Field(min_length=1, max_length=4000)
persona: Literal["support", "sales", "technical", "onboarding"] = "support"
user_id: str = Field(pattern=r"^[a-zA-Z0-9_-]{1,64}$")
@field_validator("user_message")
@classmethod
def sanitize_message(cls, v: str) -> str:
# SAFE: normalize Unicode to prevent homoglyph injection
v = unicodedata.normalize("NFKC", v)
# SAFE: strip ChatML and Llama special tokens that could alter parsing
v = re.sub(r"<\|im_(start|end)\|>", "", v)
v = re.sub(r"\[INST\]|\[/INST\]", "", v)
# SAFE: strip null bytes and non-printable control characters
v = re.sub(r"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]", "", v)
return v.strip()
# VULNERABLE: raw request data forwarded to LLM
def handle_chat_vulnerable(data: dict) -> str:
import openai
return openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"You are a {data['persona']} assistant."}, # VULNERABLE
{"role": "user", "content": data["message"]}, # VULNERABLE: no validation
],
).choices[0].message.content
# SAFE: validated request via Pydantic model
def handle_chat_safe(data: dict) -> str:
import openai
req = LLMRequest(**data) # SAFE: raises ValidationError on bad input
return openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"You are a {req.persona} assistant."}, # SAFE: enum-validated
{"role": "user", "content": req.user_message}, # SAFE: sanitized
],
).choices[0].message.content

LLM output is a taint source. Before passing it to downstream systems, validate it for the expected format and encode it for the target context.

import html
import json
import re
from typing import Optional
# Patterns that suggest a successful prompt injection or data leak
SENSITIVE_OUTPUT_PATTERNS = [
r"sk-[a-zA-Z0-9]{20,}", # OpenAI API key
r"(?i)(system\s+prompt|my\s+instructions)\s*:",
r"(?i)I\s+was\s+(told|instructed|programmed)\s+to",
r"(?i)(access[_\s]token|bearer\s+token)",
r"\b\d{3}-\d{2}-\d{4}\b", # SSN pattern
]
_output_patterns = [re.compile(p) for p in SENSITIVE_OUTPUT_PATTERNS]
def filter_llm_output(raw_output: str, expected_format: str = "text") -> Optional[str]:
"""
Validates LLM output before use. Returns None if the output looks anomalous.
expected_format: "text" | "json" | "html_fragment"
"""
# SAFE: check for suspicious patterns indicating injection or data leak
for pattern in _output_patterns:
if pattern.search(raw_output):
return None # SAFE: discard anomalous output, log separately
if expected_format == "json":
try:
json.loads(raw_output) # SAFE: validate JSON structure
except json.JSONDecodeError:
return None
if expected_format == "html_fragment":
# SAFE: escape before use in HTML context
return html.escape(raw_output)
return raw_output
# VULNERABLE: LLM output used as SQL literal
def search_vulnerable(llm_response: str) -> list:
conn = db.connect()
return conn.execute(f"SELECT * FROM products WHERE name = '{llm_response}'").fetchall() # VULNERABLE
# SAFE: LLM output always parameterized
def search_safe(llm_response: str) -> list:
from sqlalchemy import text
conn = db.connect()
return conn.execute(text("SELECT * FROM products WHERE name = :name"), {"name": llm_response}).fetchall()

Without rate limits, a single API endpoint can be abused to consume hundreds of dollars in API credits in minutes. Apply per-user token budgets and daily spending caps.

import time
from collections import defaultdict
from threading import Lock
from fastapi import FastAPI, Request, HTTPException
import openai
app = FastAPI()
# Per-user rate state: (token count, window start)
_rate_state: dict[str, tuple[int, float]] = defaultdict(lambda: (0, time.time()))
_rate_lock = Lock()
MAX_TOKENS_PER_MINUTE = 10_000 # per user
MAX_INPUT_TOKENS = 2_000 # per request
DAILY_SPEND_CAP_USD = 5.0 # per user
def check_rate_limit(user_id: str, requested_tokens: int) -> None:
"""Raises HTTPException if user exceeds rate limits."""
with _rate_lock:
used, window_start = _rate_state[user_id]
now = time.time()
if now - window_start > 60:
# Reset window
_rate_state[user_id] = (requested_tokens, now)
return
if used + requested_tokens > MAX_TOKENS_PER_MINUTE:
raise HTTPException(status_code=429, detail="Rate limit exceeded")
_rate_state[user_id] = (used + requested_tokens, window_start)
@app.post("/chat")
async def chat(request: Request, body: dict):
user_id = body.get("user_id", "anonymous")
message = body.get("message", "")
# SAFE: enforce per-request token budget
if len(message) > MAX_INPUT_TOKENS * 4: # rough chars-to-tokens estimate
raise HTTPException(status_code=400, detail="Input too long")
check_rate_limit(user_id, len(message) // 4) # SAFE: pre-call rate check
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
max_tokens=512, # SAFE: hard per-call output token limit
)
return {"reply": response.choices[0].message.content}

Runtime guardrails intercept model inputs and outputs and enforce policy constraints — blocking harmful content, ensuring responses stay on-topic, redacting PII. Several mature open-source options exist:

  • NeMo Guardrails (NVIDIA): declarative Colang rules for dialog flow, topic restrictions, and output validation.
  • LLM Guard (ProtectAI): modular scanners for prompt injection, PII, toxicity, and secrets in both inputs and outputs.
# SAFE: LLM Guard scanner for input and output
from llm_guard import scan_prompt, scan_output
from llm_guard.input_scanners import PromptInjection, Secrets, TokenLimit
from llm_guard.output_scanners import Sensitive, NoRefusal
import openai
input_scanners = [
PromptInjection(), # SAFE: ML-based injection detection
Secrets(), # SAFE: blocks API keys and credentials in prompts
TokenLimit(limit=2000), # SAFE: rejects oversized inputs
]
output_scanners = [
Sensitive(), # SAFE: PII detection in model output
NoRefusal(), # SAFE: flags refusal responses for monitoring
]
def guarded_chat(user_input: str) -> str:
sanitized_input, results_valid, results_score = scan_prompt(input_scanners, user_input)
if not all(results_valid.values()):
return "I can't process that request." # SAFE: blocked by guardrail
client = openai.OpenAI()
raw_output = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": sanitized_input}],
).choices[0].message.content
sanitized_output, results_valid, _ = scan_output(output_scanners, user_input, raw_output)
return sanitized_output # SAFE: PII redacted from output

Log LLM interactions in a structured format. Do not log raw user messages if they may contain PII — log metadata (length, user ID, session ID, latency, token counts) and redacted content.

import logging
import json
import re
import time
from openai import OpenAI
logger = logging.getLogger("llm.interactions")
PII_PATTERNS = [
(re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), "[EMAIL]"),
(re.compile(r'\b\d{3}-\d{2}-\d{4}\b'), "[SSN]"),
(re.compile(r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'), "[CARD]"),
(re.compile(r'sk-[a-zA-Z0-9]{20,}'), "[API_KEY]"),
]
def redact(text: str) -> str:
for pattern, replacement in PII_PATTERNS:
text = pattern.sub(replacement, text)
return text
def logged_chat(user_message: str, user_id: str, session_id: str) -> str:
client = OpenAI()
start = time.monotonic()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}],
max_tokens=512,
)
latency_ms = int((time.monotonic() - start) * 1000)
output = response.choices[0].message.content
# SAFE: log metadata and redacted content — not raw user input
logger.info(json.dumps({
"event": "llm_call",
"user_id": user_id,
"session_id": session_id,
"input_len": len(user_message),
"input_redacted": redact(user_message[:500]), # SAFE: redacted and truncated
"output_len": len(output),
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"latency_ms": latency_ms,
"model": response.model,
}))
return output

Establish a baseline for normal LLM usage and alert on deviations: unusually long inputs, high-frequency requests from a single user, outputs that contain API key patterns, or a sudden spike in refusals (which may indicate jailbreak attempts).

from dataclasses import dataclass, field
from collections import deque
import statistics
@dataclass
class UsageMetrics:
user_id: str
recent_input_lengths: deque = field(default_factory=lambda: deque(maxlen=100))
recent_latencies_ms: deque = field(default_factory=lambda: deque(maxlen=100))
request_count_1min: int = 0
def record(self, input_len: int, latency_ms: int) -> list[str]:
"""Records a request and returns a list of triggered anomaly alerts."""
alerts = []
self.recent_input_lengths.append(input_len)
self.recent_latencies_ms.append(latency_ms)
self.request_count_1min += 1
# SAFE: alert on unusually long inputs (possible injection scaffolding)
if input_len > 3000:
alerts.append(f"oversized_input:{input_len}")
# SAFE: alert on high request frequency
if self.request_count_1min > 50:
alerts.append(f"high_frequency:{self.request_count_1min}")
# SAFE: alert if mean input length spikes >3x the baseline
if len(self.recent_input_lengths) >= 20:
mean = statistics.mean(self.recent_input_lengths)
if input_len > mean * 3:
alerts.append(f"input_spike:{input_len:.0f}x{mean:.0f}")
return alerts

Red teaming means actively attempting to break your application’s security properties before an attacker does. For LLM applications, this includes:

  1. Prompt injection testing: Send a battery of direct and indirect injection payloads. Does the model leak its system prompt? Follow injected instructions? Use garak for automated probe suites covering 100+ injection patterns.
  2. Jailbreak testing: Attempt to bypass content filters using known jailbreak techniques. Microsoft’s PyRIT (Python Risk Identification Toolkit) automates multi-turn jailbreak attempts using an adversarial LLM.
  3. Tool abuse testing: For agents with tool access, verify that injected instructions cannot trigger unintended tool calls. Test each tool in isolation with adversarial inputs.
  4. Data leakage testing: Verify that PII entered by one user cannot be retrieved by another. Check that system prompts are not disclosed under any framing.
Terminal window
# Install garak for automated LLM probe testing
pip install garak
# Run injection probes against an OpenAI-compatible endpoint
garak --model_type openai --model_name gpt-4o \
--probes injection \
--report_prefix ./red-team-results
# Run with Microsoft PyRIT for multi-turn jailbreak simulation
pip install pyrit
python -m pyrit.cli --target openai --model gpt-4o --attack crescendo

GDPR. If your LLM application processes personal data of EU residents, every prompt that includes that data constitutes processing under GDPR. This applies to data sent to third-party providers (OpenAI, Anthropic, Google). Ensure you have a Data Processing Agreement with your provider, minimize the personal data included in prompts, and honor right-to-erasure requests — which means auditing whether PII was included in prompts that may have contributed to model training.

HIPAA. Sending protected health information (PHI) in prompts to general-purpose LLM providers is a HIPAA violation unless the provider has executed a Business Associate Agreement (BAA) covering that use case. OpenAI’s Enterprise tier and Azure OpenAI both offer BAAs. Evaluate whether the risk of PHI in prompts can be eliminated by redacting or generalizing the data before the LLM call.

AI Act (EU, 2024). High-risk AI systems require conformity assessments, transparency documentation, and human oversight mechanisms. If your LLM application is used in hiring, credit scoring, law enforcement, or other high-risk categories, review the AI Act requirements for mandatory human oversight and explainability.

# SAFE: PII redaction before sending to third-party LLM
import re
from typing import Any
REDACTION_RULES: list[tuple[re.Pattern, str]] = [
(re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), "[EMAIL]"),
(re.compile(r'\b\d{3}-\d{2}-\d{4}\b'), "[SSN]"),
(re.compile(r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'), "[CARD_NUMBER]"),
(re.compile(r'\b(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'), "[PHONE]"),
# Add domain-specific PHI patterns here
]
def redact_before_llm(text: str) -> tuple[str, list[dict[str, Any]]]:
"""
Redacts PII from text before sending to an LLM provider.
Returns the redacted text and a log of what was redacted (for audit).
"""
audit_log = []
for pattern, placeholder in REDACTION_RULES:
matches = pattern.findall(text)
if matches:
audit_log.append({"pattern_type": placeholder, "match_count": len(matches)})
text = pattern.sub(placeholder, text)
return text, audit_log # SAFE: return redacted text and audit record

No single tool covers the full LLM security stack. Use a combination:

CategoryToolWhat it doesLimitation
Static analysisLLMArmorDetects LLM security anti-patterns in Python source code at CI timePython only; static only — cannot test live model behavior
Dynamic red-teaminggarak (NVIDIA)Automated probe runner for injection, jailbreak, and hallucinationRequires a live model endpoint
Dynamic red-teamingPyRIT (Microsoft)Multi-turn adversarial LLM attacks; jailbreak simulationRequires orchestrator LLM; adds cost
Runtime guardrailsNeMo GuardrailsDeclarative dialog flow and output policy enforcementConfig language learning curve
Runtime guardrailsLLM GuardModular scanners for injection, PII, secrets, toxicityAdds latency per call
LLM eval / adversarial testingpromptfooConfig-driven LLM testing with adversarial assertionsEval-focused; not a scanner
ObservabilityLangfusePrompt tracing, token usage, latency, and eval scoresOSS core; enterprise features paid
ObservabilityOpenTelemetryStandard tracing instrumentation for LLM pipelinesRequires custom spans for LLM context
What are the most important LLM security best practices for a production application?
In priority order: (1) Input validation and sanitization on every user-supplied string before it reaches the LLM. (2) Credential hygiene — API keys in environment variables, never in source code or prompts. (3) PII redaction before sending data to third-party providers. (4) Rate limiting and per-user token budgets to prevent cost-based DoS. (5) Minimal tool surface for agents — grant only the specific tools a task requires. (6) Output validation before LLM responses reach SQL, shell, eval(), or rendered HTML. (7) Structured logging with PII redaction for post-incident investigation.
How do I prevent employees from sending sensitive data to ChatGPT?
Technical controls: use enterprise LLM agreements that prohibit training on submitted data (OpenAI Enterprise, Azure OpenAI), deploy PII redaction middleware between your internal tools and external LLM APIs, consider on-premises or VPC-hosted models for the most sensitive workloads. Policy controls: publish a clear acceptable-use policy for AI tools, train employees on what data categories cannot be submitted to external models, and audit LLM usage logs for policy violations. The Samsung incident shows that employees will use the most convenient tool — make the secure option equally convenient.
What is the difference between LLM guardrails and input validation?
Input validation is deterministic and runs before the LLM call — it checks length, character sets, allowlists, and known-bad patterns using code. Guardrails are typically ML-based and can run before (scanning for injection, PII, or harmful content) or after (scanning the model's output for policy violations) the LLM call. Guardrails handle the semantic-level policy enforcement that deterministic validation cannot; input validation provides a fast, cheap first gate. Both are needed.
How often should I red team my LLM application?
Run automated probes (garak, promptfoo) in a staging CI pipeline on every change to your system prompt, model version, or agent tool configuration. Run a manual red-team exercise at least quarterly, or when you add significant new capabilities to the application. One-time pre-launch red teaming is insufficient — model behavior can change with provider updates, and attacker techniques evolve continuously.
Does GDPR apply to data sent in LLM prompts?
Yes, if the prompt contains personal data of EU residents. Sending that data to a third-party LLM provider constitutes processing under GDPR Article 4. You need a legal basis for the processing, a Data Processing Agreement with the provider, and you must honor data subject rights including erasure. In practice, this means redacting unnecessary PII before the LLM call, choosing providers that offer GDPR-compliant DPAs, and maintaining an audit log of what data was sent.
What is the best open-source LLM security scanner?
The right choice depends on what you need. For static analysis of Python source code (catching anti-patterns at CI time with no model required), LLMArmor is purpose-built for LLM security. For dynamic red-teaming with automated probes, garak (NVIDIA) has the broadest probe library. For multi-turn adversarial jailbreak simulation, PyRIT (Microsoft) supports more complex attack strategies. For eval-driven adversarial testing, promptfoo provides a config-driven framework. Use LLMArmor and at least one dynamic tool together — they find different classes of issues.
How do I set token limits without degrading application quality?
Start by measuring your 95th-percentile input and output token counts in production. Set your input limit at 2x the 95th percentile (to handle legitimate edge cases while blocking extreme inputs). Set your output limit based on the longest legitimate response your application needs — for most chat applications, 512–1024 tokens is sufficient. For document summarization, 2048 may be appropriate. Review rejection rates weekly and adjust. A hard per-call limit combined with a per-user-per-minute budget prevents both individual oversized requests and sustained abuse.
Can I use LLMArmor to detect all OWASP LLM Top 10 risks?
No, and LLMArmor does not claim to. It performs static analysis of Python source code and has strong coverage for LLM01 (Prompt Injection structural patterns), LLM08 (Excessive Agency — agent tool scope and iteration limits), and partial coverage for LLM02 (credential hygiene), LLM05 (unsafe output sinks), and LLM10 (missing rate limits). LLM03 (Supply Chain), LLM04 (Model Poisoning), and LLM09 (Misinformation) are out of scope for any static analysis tool. Dynamic behavioral risks require dynamic tools.