Input Threats
Prompt injection, jailbreaks, and data leakage via user queries. Mitigate with validation, sanitization, and data classification.
In April 2023, Samsung Electronics discovered that employees in its semiconductor division had pasted proprietary source code and internal meeting notes into ChatGPT while asking it for help debugging and summarization. The data was transmitted to OpenAI’s servers and — under the terms of service at the time — could be used to improve future model versions. Samsung banned internal use of generative AI tools within weeks. The incident required no technical exploit: no injections, no API key theft, no novel vulnerabilities. Engineers simply used the most convenient tool available, and corporate data left the building. Production LLM security is not only about stopping attackers. It is about building systems where doing the right thing is also the easy thing.
Before writing a single line of security code, map your threat model. LLM applications have a different attack surface than traditional web apps:
eval(), SQL queries, shell commands, rendered HTML — enabling secondary injection attacks.Input Threats
Prompt injection, jailbreaks, and data leakage via user queries. Mitigate with validation, sanitization, and data classification.
Output Threats
Hallucinations reaching security decisions, LLM output in SQL/eval/HTML. Mitigate with output validation and encoding.
Runtime Threats
Agent tool abuse, excessive API costs, behavioral drift. Mitigate with guardrails, limits, and anomaly detection.
Compliance Risks
PII in logs, cross-border data transfer, GDPR/HIPAA exposure. Mitigate with data governance and provider audit.
Every input that reaches an LLM should pass through a validation layer that enforces type, length, character set, and allowlist constraints before the model call.
import reimport unicodedatafrom pydantic import BaseModel, Field, field_validatorfrom typing import Literal
ALLOWED_PERSONAS = frozenset({"support", "sales", "technical", "onboarding"})
class LLMRequest(BaseModel): """Validated, sanitized request to the LLM layer.""" user_message: str = Field(min_length=1, max_length=4000) persona: Literal["support", "sales", "technical", "onboarding"] = "support" user_id: str = Field(pattern=r"^[a-zA-Z0-9_-]{1,64}$")
@field_validator("user_message") @classmethod def sanitize_message(cls, v: str) -> str: # SAFE: normalize Unicode to prevent homoglyph injection v = unicodedata.normalize("NFKC", v) # SAFE: strip ChatML and Llama special tokens that could alter parsing v = re.sub(r"<\|im_(start|end)\|>", "", v) v = re.sub(r"\[INST\]|\[/INST\]", "", v) # SAFE: strip null bytes and non-printable control characters v = re.sub(r"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]", "", v) return v.strip()
# VULNERABLE: raw request data forwarded to LLMdef handle_chat_vulnerable(data: dict) -> str: import openai return openai.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": f"You are a {data['persona']} assistant."}, # VULNERABLE {"role": "user", "content": data["message"]}, # VULNERABLE: no validation ], ).choices[0].message.content
# SAFE: validated request via Pydantic modeldef handle_chat_safe(data: dict) -> str: import openai req = LLMRequest(**data) # SAFE: raises ValidationError on bad input return openai.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": f"You are a {req.persona} assistant."}, # SAFE: enum-validated {"role": "user", "content": req.user_message}, # SAFE: sanitized ], ).choices[0].message.contentLLM output is a taint source. Before passing it to downstream systems, validate it for the expected format and encode it for the target context.
import htmlimport jsonimport refrom typing import Optional
# Patterns that suggest a successful prompt injection or data leakSENSITIVE_OUTPUT_PATTERNS = [ r"sk-[a-zA-Z0-9]{20,}", # OpenAI API key r"(?i)(system\s+prompt|my\s+instructions)\s*:", r"(?i)I\s+was\s+(told|instructed|programmed)\s+to", r"(?i)(access[_\s]token|bearer\s+token)", r"\b\d{3}-\d{2}-\d{4}\b", # SSN pattern]
_output_patterns = [re.compile(p) for p in SENSITIVE_OUTPUT_PATTERNS]
def filter_llm_output(raw_output: str, expected_format: str = "text") -> Optional[str]: """ Validates LLM output before use. Returns None if the output looks anomalous. expected_format: "text" | "json" | "html_fragment" """ # SAFE: check for suspicious patterns indicating injection or data leak for pattern in _output_patterns: if pattern.search(raw_output): return None # SAFE: discard anomalous output, log separately
if expected_format == "json": try: json.loads(raw_output) # SAFE: validate JSON structure except json.JSONDecodeError: return None
if expected_format == "html_fragment": # SAFE: escape before use in HTML context return html.escape(raw_output)
return raw_output
# VULNERABLE: LLM output used as SQL literaldef search_vulnerable(llm_response: str) -> list: conn = db.connect() return conn.execute(f"SELECT * FROM products WHERE name = '{llm_response}'").fetchall() # VULNERABLE
# SAFE: LLM output always parameterizeddef search_safe(llm_response: str) -> list: from sqlalchemy import text conn = db.connect() return conn.execute(text("SELECT * FROM products WHERE name = :name"), {"name": llm_response}).fetchall()Without rate limits, a single API endpoint can be abused to consume hundreds of dollars in API credits in minutes. Apply per-user token budgets and daily spending caps.
import timefrom collections import defaultdictfrom threading import Lockfrom fastapi import FastAPI, Request, HTTPExceptionimport openai
app = FastAPI()
# Per-user rate state: (token count, window start)_rate_state: dict[str, tuple[int, float]] = defaultdict(lambda: (0, time.time()))_rate_lock = Lock()
MAX_TOKENS_PER_MINUTE = 10_000 # per userMAX_INPUT_TOKENS = 2_000 # per requestDAILY_SPEND_CAP_USD = 5.0 # per user
def check_rate_limit(user_id: str, requested_tokens: int) -> None: """Raises HTTPException if user exceeds rate limits.""" with _rate_lock: used, window_start = _rate_state[user_id] now = time.time() if now - window_start > 60: # Reset window _rate_state[user_id] = (requested_tokens, now) return if used + requested_tokens > MAX_TOKENS_PER_MINUTE: raise HTTPException(status_code=429, detail="Rate limit exceeded") _rate_state[user_id] = (used + requested_tokens, window_start)
@app.post("/chat")async def chat(request: Request, body: dict): user_id = body.get("user_id", "anonymous") message = body.get("message", "")
# SAFE: enforce per-request token budget if len(message) > MAX_INPUT_TOKENS * 4: # rough chars-to-tokens estimate raise HTTPException(status_code=400, detail="Input too long")
check_rate_limit(user_id, len(message) // 4) # SAFE: pre-call rate check
client = openai.OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": message}], max_tokens=512, # SAFE: hard per-call output token limit ) return {"reply": response.choices[0].message.content}Runtime guardrails intercept model inputs and outputs and enforce policy constraints — blocking harmful content, ensuring responses stay on-topic, redacting PII. Several mature open-source options exist:
# SAFE: LLM Guard scanner for input and outputfrom llm_guard import scan_prompt, scan_outputfrom llm_guard.input_scanners import PromptInjection, Secrets, TokenLimitfrom llm_guard.output_scanners import Sensitive, NoRefusalimport openai
input_scanners = [ PromptInjection(), # SAFE: ML-based injection detection Secrets(), # SAFE: blocks API keys and credentials in prompts TokenLimit(limit=2000), # SAFE: rejects oversized inputs]output_scanners = [ Sensitive(), # SAFE: PII detection in model output NoRefusal(), # SAFE: flags refusal responses for monitoring]
def guarded_chat(user_input: str) -> str: sanitized_input, results_valid, results_score = scan_prompt(input_scanners, user_input) if not all(results_valid.values()): return "I can't process that request." # SAFE: blocked by guardrail
client = openai.OpenAI() raw_output = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": sanitized_input}], ).choices[0].message.content
sanitized_output, results_valid, _ = scan_output(output_scanners, user_input, raw_output) return sanitized_output # SAFE: PII redacted from outputLog LLM interactions in a structured format. Do not log raw user messages if they may contain PII — log metadata (length, user ID, session ID, latency, token counts) and redacted content.
import loggingimport jsonimport reimport timefrom openai import OpenAI
logger = logging.getLogger("llm.interactions")
PII_PATTERNS = [ (re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), "[EMAIL]"), (re.compile(r'\b\d{3}-\d{2}-\d{4}\b'), "[SSN]"), (re.compile(r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'), "[CARD]"), (re.compile(r'sk-[a-zA-Z0-9]{20,}'), "[API_KEY]"),]
def redact(text: str) -> str: for pattern, replacement in PII_PATTERNS: text = pattern.sub(replacement, text) return text
def logged_chat(user_message: str, user_id: str, session_id: str) -> str: client = OpenAI() start = time.monotonic()
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": user_message}], max_tokens=512, ) latency_ms = int((time.monotonic() - start) * 1000) output = response.choices[0].message.content
# SAFE: log metadata and redacted content — not raw user input logger.info(json.dumps({ "event": "llm_call", "user_id": user_id, "session_id": session_id, "input_len": len(user_message), "input_redacted": redact(user_message[:500]), # SAFE: redacted and truncated "output_len": len(output), "prompt_tokens": response.usage.prompt_tokens, "completion_tokens": response.usage.completion_tokens, "latency_ms": latency_ms, "model": response.model, })) return outputEstablish a baseline for normal LLM usage and alert on deviations: unusually long inputs, high-frequency requests from a single user, outputs that contain API key patterns, or a sudden spike in refusals (which may indicate jailbreak attempts).
from dataclasses import dataclass, fieldfrom collections import dequeimport statistics
@dataclassclass UsageMetrics: user_id: str recent_input_lengths: deque = field(default_factory=lambda: deque(maxlen=100)) recent_latencies_ms: deque = field(default_factory=lambda: deque(maxlen=100)) request_count_1min: int = 0
def record(self, input_len: int, latency_ms: int) -> list[str]: """Records a request and returns a list of triggered anomaly alerts.""" alerts = [] self.recent_input_lengths.append(input_len) self.recent_latencies_ms.append(latency_ms) self.request_count_1min += 1
# SAFE: alert on unusually long inputs (possible injection scaffolding) if input_len > 3000: alerts.append(f"oversized_input:{input_len}")
# SAFE: alert on high request frequency if self.request_count_1min > 50: alerts.append(f"high_frequency:{self.request_count_1min}")
# SAFE: alert if mean input length spikes >3x the baseline if len(self.recent_input_lengths) >= 20: mean = statistics.mean(self.recent_input_lengths) if input_len > mean * 3: alerts.append(f"input_spike:{input_len:.0f}x{mean:.0f}")
return alertsRed teaming means actively attempting to break your application’s security properties before an attacker does. For LLM applications, this includes:
# Install garak for automated LLM probe testingpip install garak
# Run injection probes against an OpenAI-compatible endpointgarak --model_type openai --model_name gpt-4o \ --probes injection \ --report_prefix ./red-team-results
# Run with Microsoft PyRIT for multi-turn jailbreak simulationpip install pyritpython -m pyrit.cli --target openai --model gpt-4o --attack crescendoGDPR. If your LLM application processes personal data of EU residents, every prompt that includes that data constitutes processing under GDPR. This applies to data sent to third-party providers (OpenAI, Anthropic, Google). Ensure you have a Data Processing Agreement with your provider, minimize the personal data included in prompts, and honor right-to-erasure requests — which means auditing whether PII was included in prompts that may have contributed to model training.
HIPAA. Sending protected health information (PHI) in prompts to general-purpose LLM providers is a HIPAA violation unless the provider has executed a Business Associate Agreement (BAA) covering that use case. OpenAI’s Enterprise tier and Azure OpenAI both offer BAAs. Evaluate whether the risk of PHI in prompts can be eliminated by redacting or generalizing the data before the LLM call.
AI Act (EU, 2024). High-risk AI systems require conformity assessments, transparency documentation, and human oversight mechanisms. If your LLM application is used in hiring, credit scoring, law enforcement, or other high-risk categories, review the AI Act requirements for mandatory human oversight and explainability.
# SAFE: PII redaction before sending to third-party LLMimport refrom typing import Any
REDACTION_RULES: list[tuple[re.Pattern, str]] = [ (re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), "[EMAIL]"), (re.compile(r'\b\d{3}-\d{2}-\d{4}\b'), "[SSN]"), (re.compile(r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'), "[CARD_NUMBER]"), (re.compile(r'\b(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'), "[PHONE]"), # Add domain-specific PHI patterns here]
def redact_before_llm(text: str) -> tuple[str, list[dict[str, Any]]]: """ Redacts PII from text before sending to an LLM provider. Returns the redacted text and a log of what was redacted (for audit). """ audit_log = [] for pattern, placeholder in REDACTION_RULES: matches = pattern.findall(text) if matches: audit_log.append({"pattern_type": placeholder, "match_count": len(matches)}) text = pattern.sub(placeholder, text) return text, audit_log # SAFE: return redacted text and audit recordNo single tool covers the full LLM security stack. Use a combination:
| Category | Tool | What it does | Limitation |
|---|---|---|---|
| Static analysis | LLMArmor | Detects LLM security anti-patterns in Python source code at CI time | Python only; static only — cannot test live model behavior |
| Dynamic red-teaming | garak (NVIDIA) | Automated probe runner for injection, jailbreak, and hallucination | Requires a live model endpoint |
| Dynamic red-teaming | PyRIT (Microsoft) | Multi-turn adversarial LLM attacks; jailbreak simulation | Requires orchestrator LLM; adds cost |
| Runtime guardrails | NeMo Guardrails | Declarative dialog flow and output policy enforcement | Config language learning curve |
| Runtime guardrails | LLM Guard | Modular scanners for injection, PII, secrets, toxicity | Adds latency per call |
| LLM eval / adversarial testing | promptfoo | Config-driven LLM testing with adversarial assertions | Eval-focused; not a scanner |
| Observability | Langfuse | Prompt tracing, token usage, latency, and eval scores | OSS core; enterprise features paid |
| Observability | OpenTelemetry | Standard tracing instrumentation for LLM pipelines | Requires custom spans for LLM context |