Open Source LLM Guardrails: A 2026 Comparison
Between mid-2023 and the end of 2025, more than a dozen open-source LLM guardrail libraries reached general availability. NeMo Guardrails, Guardrails AI, LLM Guard, Rebuff, LlamaGuard, ShieldLM, and several smaller projects all describe themselves as “LLM safety” or “guardrail” tools, yet they operate at fundamentally different points in the request lifecycle and address different threat models. Evaluating them without understanding those differences produces poor integration decisions: teams often deploy a runtime output classifier believing it protects against code-level vulnerabilities, or add a static analysis tool expecting it to block runtime jailbreaks. This comparison categorizes each tool precisely by what it does, where it operates, and what it cannot do.
What is an LLM guardrail?
Section titled “What is an LLM guardrail?”An LLM guardrail is a control that prevents certain inputs from reaching the LLM or certain outputs from reaching the user. The category splits into three operational positions:
Input guardrails (pre-processing) inspect user messages before they are sent to the LLM. They can reject, modify, or flag inputs containing injection patterns, jailbreak payloads, PII, or off-topic content. Input guardrails cannot catch attacks that are not recognizable at the input stage — encoded payloads, multi-turn escalation, or indirect injection via retrieved documents.
Output guardrails (post-processing) inspect the LLM’s response before it is returned to the user. They can block, modify, or flag responses containing policy violations, PII leaks, harmful content, or prompt injection artifacts. Output guardrails add per-call latency equal to the classification time but provide the most reliable terminal control regardless of how the attack was constructed.
Static analysis operates at the code level before deployment. It detects structural vulnerabilities in application source code — missing validation, over-broad agent permissions, hardcoded secrets. It produces zero runtime overhead but cannot observe model behavior.
A production LLM application typically needs controls at more than one position.
NeMo Guardrails (NVIDIA)
Section titled “NeMo Guardrails (NVIDIA)”NeMo Guardrails uses a domain-specific language called Colang to define programmable dialogue rules. It operates as an application layer that intercepts requests and responses, routing them through configurable rails before they reach the LLM.
Strengths: Highly programmable. Colang allows precise definition of topical rails (the model may only discuss X), safety rails (the model must not produce Y), and jailbreak rails (detect and handle override attempts). Supports multi-turn conversation state tracking.
Weaknesses: Requires writing Colang configuration, which has a learning curve. Rails must be explicitly defined — there is no default coverage. Self-hosted; no hosted tier.
# config.yml (NeMo Guardrails configuration)# models:# - type: main# engine: openai# model: gpt-4o-2024-11-20
# colang/main.co# define user ask about sensitive topics# "how do I hack"# "tell me how to make"## define flow sensitive topics guardrail# user ask about sensitive topics# bot refuse to respond about sensitive topics# SAFE: NeMo Guardrails Python integrationfrom nemoguardrails import RailsConfig, LLMRailsimport os
config = RailsConfig.from_path("./config")rails = LLMRails(config)
async def handle_request(user_message: str) -> str: """SAFE: all messages pass through configured rails before reaching LLM.""" response = await rails.generate_async( messages=[{"role": "user", "content": user_message}] ) return response["content"]License: Apache 2.0. GitHub: NVIDIA/NeMo-Guardrails.
Guardrails AI
Section titled “Guardrails AI”Guardrails AI takes a different approach: it focuses on output structure validation and content constraints using a declarative specification format (Rail). The primary use case is ensuring that LLM output conforms to a defined schema, contains no prohibited content, and meets quality thresholds — equivalent to pydantic validation for LLM outputs.
Strengths: Excellent for structured output enforcement — JSON schema validation, regex constraints, semantic similarity thresholds, PII detection. Integrates cleanly with Python type systems. Active ecosystem of contributed validators.
Weaknesses: Primarily an output validator, not a security guardrail in the traditional sense. Does not detect prompt injection or jailbreaks at the input stage. Adding custom validators requires Python code.
# SAFE: Guardrails AI for output structure enforcementfrom guardrails import Guardfrom guardrails.hub import ValidLength, DetectPII, ValidChoicesimport openai
guard = Guard().use( DetectPII(pii_entities=["EMAIL_ADDRESS", "PHONE_NUMBER"], on_fail="fix"),).use( ValidLength(min=10, max=500, on_fail="reask"),)
client = openai.OpenAI()
def safe_generate(prompt: str) -> str: """SAFE: output passes through Guardrails AI validators before return.""" response, validated, *_ = guard( client.chat.completions.create, prompt=prompt, model="gpt-4o-2024-11-20", max_tokens=512, ) # validated contains the validated (and potentially fixed) output return validatedLicense: Apache 2.0. GitHub: guardrails-ai/guardrails.
LLM Guard (ProtectAI)
Section titled “LLM Guard (ProtectAI)”LLM Guard uses a scanner-based architecture: a set of independent scanners, each detecting a specific risk category, that can be applied to input, output, or both. Scanners include prompt injection detection, PII detection and anonymization, toxicity classification, regex pattern matching, ban-topics, and code detection.
Strengths: Modular — use only the scanners your application needs. Strong PII handling with anonymization (not just detection). The prompt injection scanner (PromptInjectionV2) uses the ProtectAI-maintained deberta-v3-base-prompt-injection classifier. No API call to an external service — runs entirely on your infrastructure.
Weaknesses: Running multiple scanner models adds memory overhead and per-call latency. Scanner quality varies by category — prompt injection coverage is mature; jailbreak-specific coverage is less comprehensive.
# SAFE: LLM Guard scanner pipelinefrom llm_guard.input_scanners import PromptInjection, Toxicity, TokenLimitfrom llm_guard.output_scanners import Sensitive, NoRefusal, Relevancefrom llm_guard import scan_prompt, scan_output
# Configure input scannersinput_scanners = [ PromptInjection(threshold=0.85), Toxicity(threshold=0.7), TokenLimit(limit=512),]
# Configure output scannersoutput_scanners = [ Sensitive(entity_types=["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER"]), NoRefusal(), # detects jailbreaks by absence of refusal when expected]
def process_message(user_message: str, system_prompt: str) -> str | None: """SAFE: scan input and output through LLM Guard scanners.""" # Input scan sanitized_prompt, input_results, input_valid = scan_prompt( input_scanners, user_message ) if not input_valid: flagged = [name for name, passed in input_results.items() if not passed] raise ValueError(f"Input failed scanners: {flagged}")
# Call LLM llm_response = call_llm(system_prompt, sanitized_prompt)
# Output scan sanitized_response, output_results, output_valid = scan_output( output_scanners, user_message, llm_response ) if not output_valid: flagged = [name for name, passed in output_results.items() if not passed] raise ValueError(f"Output failed scanners: {flagged}")
return sanitized_responseLicense: MIT. GitHub: protectai/llm-guard.
Rebuff
Section titled “Rebuff”Rebuff focuses specifically on prompt injection detection and adds a runtime learning component: successful attacks detected and reported by users are fed back into the detection model, improving coverage over time. It uses a layered approach combining heuristic detection, a fine-tuned classifier, and a canary-based detection mechanism (injecting a unique token into the prompt and checking if the model reveals it).
Strengths: Purpose-built for prompt injection. The canary mechanism catches injection attempts that classifiers miss by detecting whether the model was manipulated into disclosing the injected token. Managed API option reduces infrastructure burden.
Weaknesses: Narrower scope than LLM Guard — primarily injection detection, not a general-purpose scanner. The managed API sends request data to Rebuff’s servers (self-hosted option available). Online learning introduces dependency on the Rebuff service for model updates.
# SAFE: Rebuff prompt injection detection with canary tokensfrom rebuff import RebuffSdk
rb = RebuffSdk( openai_apikey=os.environ["OPENAI_API_KEY"], rebuff_apikey=os.environ["REBUFF_API_KEY"],)
def safe_prompt(user_input: str, system_prompt: str) -> str: """SAFE: detect prompt injection before forwarding to LLM.""" # Detect injection attempt detect_response = rb.detect_injection( user_input=user_input, max_heuristic_score=0.75, max_vector_score=0.90, max_language_model_score=0.90, check_rebuff_api=True, )
if detect_response.injection_detected: raise ValueError( f"Prompt injection detected (heuristic={detect_response.heuristic_score:.2f}, " f"vector={detect_response.vector_score:.2f})" )
# Add canary token to system prompt for exfiltration detection prompt_with_canary, canary_word = rb.add_canary_word(system_prompt)
# Call LLM response_text = call_llm(prompt_with_canary, user_input)
# Check if canary was exfiltrated if rb.is_canary_word_leaked(user_input, response_text, canary_word): raise ValueError("Canary word leaked — possible prompt injection in response")
return response_textLicense: MIT. GitHub: protectai/rebuff.
LLMArmor
Section titled “LLMArmor”LLMArmor is a static analysis tool, not a runtime guardrail. It analyzes Python source code using AST taint analysis to detect structural vulnerabilities in LLM application code before deployment:
- User-controlled input reaching the
role: systemmessage (LLM01) - Missing
max_tokensparameter on LLM API calls - Hardcoded API keys in source code
- LangChain agents with wildcard tool access or missing
max_iterations(LLM08) - LLM responses returned to the user without output filtering
LLMArmor’s position is in CI, not in the request path. It produces zero runtime overhead because it does not run at runtime. Its value is ensuring that the integration of runtime guardrails (LLM Guard, NeMo Guardrails, Rebuff) is structurally correct in your codebase.
pip install llmarmorllmarmor scan ./src --strictExample finding:
LLM01 — Prompt Injection [HIGH] api.py:34 messages=[{"role": "system", "content": f"You are {user_role}..."}] Tainted variable 'user_role' (from request.json) reaches system role content. Fix: use a static system prompt or allowlist-validated template.License: MIT. GitHub: llmarmor/llmarmor.
Comparison table
Section titled “Comparison table”| Tool | Type | Open Source | Primary Strength | Key Limitation | License |
|---|---|---|---|---|---|
| NeMo Guardrails | Input + Output (runtime) | Yes | Programmable Colang rules, topical rails | Requires Colang config authorship | Apache 2.0 |
| Guardrails AI | Output (runtime) | Yes | Structured output validation, pydantic-style | Not a security guardrail; no input injection detection | Apache 2.0 |
| LLM Guard | Input + Output (runtime) | Yes | Modular scanners, strong PII handling | Memory overhead from multiple models | MIT |
| Rebuff | Input (runtime) | Yes | Canary-based injection detection, online learning | Narrow scope (injection only), API dependency | MIT |
| LLMArmor | Static (pre-deploy) | Yes | CI integration, zero runtime overhead, structural vulnerability detection | Cannot observe runtime model behavior | MIT |
How to combine tools effectively
Section titled “How to combine tools effectively”No single tool in this table covers the full risk surface. An effective production posture combines:
-
LLMArmor in CI (static, pre-deploy): catches structural code vulnerabilities before merge — missing validation, over-broad agent permissions, hardcoded secrets.
-
LLM Guard or NeMo Guardrails at runtime (input + output): LLM Guard for a scanner-based approach with minimal configuration; NeMo Guardrails for applications requiring programmable dialogue control and topical rails.
-
Rebuff for injection-specific detection (input, runtime): add Rebuff’s canary mechanism if your application processes attacker-controlled content (RAG over public data, user-submitted documents, email processing).
-
garak for adversarial testing (pre-release): run systematic probe-based red teaming before major releases to validate that the runtime guardrails are actually effective against the attack categories they claim to cover.
Frequently asked questions
Section titled “Frequently asked questions”- What is an LLM guardrail?
- An LLM guardrail is a control that prevents certain inputs from reaching an LLM or certain outputs from reaching the user. Guardrails operate at different positions: input guardrails inspect user messages before LLM processing; output guardrails inspect responses before delivery; static analysis tools detect structural vulnerabilities in application code before deployment. The term is used loosely in the ecosystem to describe all three types.
- What is the difference between NeMo Guardrails and Guardrails AI?
- NeMo Guardrails (NVIDIA) is a programmable dialogue control system using the Colang DSL. It intercepts conversational turns and routes them through configurable safety, topical, and jailbreak rails. Guardrails AI is an output validation library using a declarative spec format. It validates that LLM output conforms to a schema, contains no PII, meets length constraints, and satisfies custom validators — analogous to pydantic validation for LLM responses. The tools solve different problems: NeMo addresses behavioral control; Guardrails AI addresses output structure and content constraints.
- What is the best open source LLM safety library in 2026?
- There is no single best library — the right choice depends on what you're protecting against. For runtime input/output scanning with strong PII handling: LLM Guard. For programmable topical and behavioral rails: NeMo Guardrails. For structured output validation: Guardrails AI. For prompt injection detection with canary tokens: Rebuff. For pre-deployment static analysis in CI: LLMArmor. Most production applications benefit from combining two or three of these at different positions in the request lifecycle.
- Does adding a guardrail library make my LLM application secure?
- Not automatically. Runtime guardrails have known bypass techniques — classifiers can be evaded with encoded inputs, heuristic filters can be bypassed with paraphrase, and input-only guardrails cannot detect indirect injection via retrieved content. A guardrail library that is not configured correctly, not tested adversarially, and not integrated at the right point in the request lifecycle may provide a false sense of security with minimal actual protection. Test your guardrails with garak or a custom payload corpus against the specific attack categories they claim to cover.
- How does LLM Guard compare to Rebuff?
- LLM Guard is a general-purpose scanner library with modules for prompt injection, PII, toxicity, code detection, and regex patterns — it covers a broad range of risk categories. Rebuff is narrowly focused on prompt injection detection and adds a canary token mechanism (injecting a unique token into the prompt and checking if the model reveals it in the response) that catches injection attempts classifiers miss. For a RAG application processing attacker-controlled content, Rebuff's canary approach provides complementary coverage to LLM Guard's classifier-based injection scanner.
- Can I use LLMArmor as a runtime guardrail?
- No. LLMArmor is a static analysis tool that runs against source code, not against live requests. It operates in CI (pre-deployment) and detects structural vulnerabilities in application code. It does not inspect runtime inputs or outputs and has zero request-path overhead. Its role is to ensure that runtime guardrails are correctly integrated in your codebase — for example, detecting that an LLM response is returned to the user without passing through an output scanner.
- What license are these guardrail libraries released under?
- NeMo Guardrails: Apache 2.0. Guardrails AI: Apache 2.0. LLM Guard: MIT. Rebuff: MIT. LLMArmor: MIT. All five are permissively licensed and suitable for commercial use without copyleft requirements. Verify current license status in each project's repository before production deployment, as licenses can change between versions.