What is the difference between NeMo Guardrails and Guardrails AI?

NeMo Guardrails (NVIDIA) is a programmable dialogue control system using the Colang DSL. It intercepts conversational turns and routes them through configurable safety, topical, and jailbreak rails. Guardrails AI is an output validation library using a declarative spec format. It validates that LLM output conforms to a schema, contains no PII, meets length constraints, and satisfies custom validators — analogous to pydantic validation for LLM responses. The tools solve different problems: NeMo addresses behavioral control; Guardrails AI addresses output structure and content constraints.

What is the best open source LLM safety library in 2026?

There is no single best library — the right choice depends on what you're protecting against. For runtime input/output scanning with strong PII handling: LLM Guard. For programmable topical and behavioral rails: NeMo Guardrails. For structured output validation: Guardrails AI. For prompt injection detection with canary tokens: Rebuff. For pre-deployment static analysis in CI: LLMArmor. Most production applications benefit from combining two or three of these at different positions in the request lifecycle.

Does adding a guardrail library make my LLM application secure?

Not automatically. Runtime guardrails have known bypass techniques — classifiers can be evaded with encoded inputs, heuristic filters can be bypassed with paraphrase, and input-only guardrails cannot detect indirect injection via retrieved content. A guardrail library that is not configured correctly, not tested adversarially, and not integrated at the right point in the request lifecycle may provide a false sense of security with minimal actual protection. Test your guardrails with garak or a custom payload corpus against the specific attack categories they claim to cover.

Can I use LLMArmor as a runtime guardrail?

No. LLMArmor is a static analysis tool that runs against source code, not against live requests. It operates in CI (pre-deployment) and detects structural vulnerabilities in application code. It does not inspect runtime inputs or outputs and has zero request-path overhead. Its role is to ensure that runtime guardrails are correctly integrated in your codebase — for example, detecting that an LLM response is returned to the user without passing through an output scanner.

What license are these guardrail libraries released under?

NeMo Guardrails: Apache 2.0. Guardrails AI: Apache 2.0. LLM Guard: MIT. Rebuff: MIT. LLMArmor: MIT. All five are permissively licensed and suitable for commercial use without copyleft requirements. Verify current license status in each project's repository before production deployment, as licenses can change between versions.

Open Source LLM Guardrails: A 2026 Comparison

Between mid-2023 and the end of 2025, more than a dozen open-source LLM guardrail libraries reached general availability. NeMo Guardrails, Guardrails AI, LLM Guard, Rebuff, LlamaGuard, ShieldLM, and several smaller projects all describe themselves as “LLM safety” or “guardrail” tools, yet they operate at fundamentally different points in the request lifecycle and address different threat models. Evaluating them without understanding those differences produces poor integration decisions: teams often deploy a runtime output classifier believing it protects against code-level vulnerabilities, or add a static analysis tool expecting it to block runtime jailbreaks. This comparison categorizes each tool precisely by what it does, where it operates, and what it cannot do.

What is an LLM guardrail?

An LLM guardrail is a control that prevents certain inputs from reaching the LLM or certain outputs from reaching the user. The category splits into three operational positions:

Input guardrails (pre-processing) inspect user messages before they are sent to the LLM. They can reject, modify, or flag inputs containing injection patterns, jailbreak payloads, PII, or off-topic content. Input guardrails cannot catch attacks that are not recognizable at the input stage — encoded payloads, multi-turn escalation, or indirect injection via retrieved documents.

Output guardrails (post-processing) inspect the LLM’s response before it is returned to the user. They can block, modify, or flag responses containing policy violations, PII leaks, harmful content, or prompt injection artifacts. Output guardrails add per-call latency equal to the classification time but provide the most reliable terminal control regardless of how the attack was constructed.

Static analysis operates at the code level before deployment. It detects structural vulnerabilities in application source code — missing validation, over-broad agent permissions, hardcoded secrets. It produces zero runtime overhead but cannot observe model behavior.

A production LLM application typically needs controls at more than one position.

NeMo Guardrails (NVIDIA)

NeMo Guardrails uses a domain-specific language called Colang to define programmable dialogue rules. It operates as an application layer that intercepts requests and responses, routing them through configurable rails before they reach the LLM.

Strengths: Highly programmable. Colang allows precise definition of topical rails (the model may only discuss X), safety rails (the model must not produce Y), and jailbreak rails (detect and handle override attempts). Supports multi-turn conversation state tracking.

Weaknesses: Requires writing Colang configuration, which has a learning curve. Rails must be explicitly defined — there is no default coverage. Self-hosted; no hosted tier.

# config.yml (NeMo Guardrails configuration)
# models:
#   - type: main
#     engine: openai
#     model: gpt-4o-2024-11-20

# colang/main.co
# define user ask about sensitive topics
#   "how do I hack"
#   "tell me how to make"
#
# define flow sensitive topics guardrail
#   user ask about sensitive topics
#   bot refuse to respond about sensitive topics

# SAFE: NeMo Guardrails Python integration
from nemoguardrails import RailsConfig, LLMRails
import os

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

async def handle_request(user_message: str) -> str:
    """SAFE: all messages pass through configured rails before reaching LLM."""
    response = await rails.generate_async(
        messages=[{"role": "user", "content": user_message}]
    )
    return response["content"]

License: Apache 2.0. GitHub: NVIDIA/NeMo-Guardrails.

Guardrails AI

Guardrails AI takes a different approach: it focuses on output structure validation and content constraints using a declarative specification format (Rail). The primary use case is ensuring that LLM output conforms to a defined schema, contains no prohibited content, and meets quality thresholds — equivalent to pydantic validation for LLM outputs.

Strengths: Excellent for structured output enforcement — JSON schema validation, regex constraints, semantic similarity thresholds, PII detection. Integrates cleanly with Python type systems. Active ecosystem of contributed validators.

Weaknesses: Primarily an output validator, not a security guardrail in the traditional sense. Does not detect prompt injection or jailbreaks at the input stage. Adding custom validators requires Python code.

# SAFE: Guardrails AI for output structure enforcement
from guardrails import Guard
from guardrails.hub import ValidLength, DetectPII, ValidChoices
import openai

guard = Guard().use(
    DetectPII(pii_entities=["EMAIL_ADDRESS", "PHONE_NUMBER"], on_fail="fix"),
).use(
    ValidLength(min=10, max=500, on_fail="reask"),
)

client = openai.OpenAI()

def safe_generate(prompt: str) -> str:
    """SAFE: output passes through Guardrails AI validators before return."""
    response, validated, *_ = guard(
        client.chat.completions.create,
        prompt=prompt,
        model="gpt-4o-2024-11-20",
        max_tokens=512,
    )
    # validated contains the validated (and potentially fixed) output
    return validated

License: Apache 2.0. GitHub: guardrails-ai/guardrails.

LLM Guard (ProtectAI)

LLM Guard uses a scanner-based architecture: a set of independent scanners, each detecting a specific risk category, that can be applied to input, output, or both. Scanners include prompt injection detection, PII detection and anonymization, toxicity classification, regex pattern matching, ban-topics, and code detection.

Strengths: Modular — use only the scanners your application needs. Strong PII handling with anonymization (not just detection). The prompt injection scanner (PromptInjectionV2) uses the ProtectAI-maintained deberta-v3-base-prompt-injection classifier. No API call to an external service — runs entirely on your infrastructure.

Weaknesses: Running multiple scanner models adds memory overhead and per-call latency. Scanner quality varies by category — prompt injection coverage is mature; jailbreak-specific coverage is less comprehensive.

# SAFE: LLM Guard scanner pipeline
from llm_guard.input_scanners import PromptInjection, Toxicity, TokenLimit
from llm_guard.output_scanners import Sensitive, NoRefusal, Relevance
from llm_guard import scan_prompt, scan_output

# Configure input scanners
input_scanners = [
    PromptInjection(threshold=0.85),
    Toxicity(threshold=0.7),
    TokenLimit(limit=512),
]

# Configure output scanners
output_scanners = [
    Sensitive(entity_types=["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER"]),
    NoRefusal(),  # detects jailbreaks by absence of refusal when expected
]

def process_message(user_message: str, system_prompt: str) -> str | None:
    """SAFE: scan input and output through LLM Guard scanners."""
    # Input scan
    sanitized_prompt, input_results, input_valid = scan_prompt(
        input_scanners, user_message
    )
    if not input_valid:
        flagged = [name for name, passed in input_results.items() if not passed]
        raise ValueError(f"Input failed scanners: {flagged}")

    # Call LLM
    llm_response = call_llm(system_prompt, sanitized_prompt)

    # Output scan
    sanitized_response, output_results, output_valid = scan_output(
        output_scanners, user_message, llm_response
    )
    if not output_valid:
        flagged = [name for name, passed in output_results.items() if not passed]
        raise ValueError(f"Output failed scanners: {flagged}")

    return sanitized_response

License: MIT. GitHub: protectai/llm-guard.

Rebuff

Rebuff focuses specifically on prompt injection detection and adds a runtime learning component: successful attacks detected and reported by users are fed back into the detection model, improving coverage over time. It uses a layered approach combining heuristic detection, a fine-tuned classifier, and a canary-based detection mechanism (injecting a unique token into the prompt and checking if the model reveals it).

Strengths: Purpose-built for prompt injection. The canary mechanism catches injection attempts that classifiers miss by detecting whether the model was manipulated into disclosing the injected token. Managed API option reduces infrastructure burden.

Weaknesses: Narrower scope than LLM Guard — primarily injection detection, not a general-purpose scanner. The managed API sends request data to Rebuff’s servers (self-hosted option available). Online learning introduces dependency on the Rebuff service for model updates.

# SAFE: Rebuff prompt injection detection with canary tokens
from rebuff import RebuffSdk

rb = RebuffSdk(
    openai_apikey=os.environ["OPENAI_API_KEY"],
    rebuff_apikey=os.environ["REBUFF_API_KEY"],
)

def safe_prompt(user_input: str, system_prompt: str) -> str:
    """SAFE: detect prompt injection before forwarding to LLM."""
    # Detect injection attempt
    detect_response = rb.detect_injection(
        user_input=user_input,
        max_heuristic_score=0.75,
        max_vector_score=0.90,
        max_language_model_score=0.90,
        check_rebuff_api=True,
    )

    if detect_response.injection_detected:
        raise ValueError(
            f"Prompt injection detected (heuristic={detect_response.heuristic_score:.2f}, "
            f"vector={detect_response.vector_score:.2f})"
        )

    # Add canary token to system prompt for exfiltration detection
    prompt_with_canary, canary_word = rb.add_canary_word(system_prompt)

    # Call LLM
    response_text = call_llm(prompt_with_canary, user_input)

    # Check if canary was exfiltrated
    if rb.is_canary_word_leaked(user_input, response_text, canary_word):
        raise ValueError("Canary word leaked — possible prompt injection in response")

    return response_text

License: MIT. GitHub: protectai/rebuff.

LLMArmor

LLMArmor is a static analysis tool, not a runtime guardrail. It analyzes Python source code using AST taint analysis to detect structural vulnerabilities in LLM application code before deployment:

User-controlled input reaching the role: system message (LLM01)
Missing max_tokens parameter on LLM API calls
Hardcoded API keys in source code
LangChain agents with wildcard tool access or missing max_iterations (LLM08)
LLM responses returned to the user without output filtering

LLMArmor’s position is in CI, not in the request path. It produces zero runtime overhead because it does not run at runtime. Its value is ensuring that the integration of runtime guardrails (LLM Guard, NeMo Guardrails, Rebuff) is structurally correct in your codebase.

pip install llmarmor
llmarmor scan ./src --strict

Example finding:

LLM01 — Prompt Injection [HIGH]
  api.py:34  messages=[{"role": "system", "content": f"You are {user_role}..."}]
  Tainted variable 'user_role' (from request.json) reaches system role content.
  Fix: use a static system prompt or allowlist-validated template.

License: MIT. GitHub: llmarmor/llmarmor.

Comparison table

Tool	Type	Open Source	Primary Strength	Key Limitation	License
NeMo Guardrails	Input + Output (runtime)	Yes	Programmable Colang rules, topical rails	Requires Colang config authorship	Apache 2.0
Guardrails AI	Output (runtime)	Yes	Structured output validation, pydantic-style	Not a security guardrail; no input injection detection	Apache 2.0
LLM Guard	Input + Output (runtime)	Yes	Modular scanners, strong PII handling	Memory overhead from multiple models	MIT
Rebuff	Input (runtime)	Yes	Canary-based injection detection, online learning	Narrow scope (injection only), API dependency	MIT
LLMArmor	Static (pre-deploy)	Yes	CI integration, zero runtime overhead, structural vulnerability detection	Cannot observe runtime model behavior	MIT

How to combine tools effectively

No single tool in this table covers the full risk surface. An effective production posture combines:

LLMArmor in CI (static, pre-deploy): catches structural code vulnerabilities before merge — missing validation, over-broad agent permissions, hardcoded secrets.
LLM Guard or NeMo Guardrails at runtime (input + output): LLM Guard for a scanner-based approach with minimal configuration; NeMo Guardrails for applications requiring programmable dialogue control and topical rails.
Rebuff for injection-specific detection (input, runtime): add Rebuff’s canary mechanism if your application processes attacker-controlled content (RAG over public data, user-submitted documents, email processing).
garak for adversarial testing (pre-release): run systematic probe-based red teaming before major releases to validate that the runtime guardrails are actually effective against the attack categories they claim to cover.

Frequently asked questions

What is an LLM guardrail?: An LLM guardrail is a control that prevents certain inputs from reaching an LLM or certain outputs from reaching the user. Guardrails operate at different positions: input guardrails inspect user messages before LLM processing; output guardrails inspect responses before delivery; static analysis tools detect structural vulnerabilities in application code before deployment. The term is used loosely in the ecosystem to describe all three types.
What is the difference between NeMo Guardrails and Guardrails AI?: NeMo Guardrails (NVIDIA) is a programmable dialogue control system using the Colang DSL. It intercepts conversational turns and routes them through configurable safety, topical, and jailbreak rails. Guardrails AI is an output validation library using a declarative spec format. It validates that LLM output conforms to a schema, contains no PII, meets length constraints, and satisfies custom validators — analogous to pydantic validation for LLM responses. The tools solve different problems: NeMo addresses behavioral control; Guardrails AI addresses output structure and content constraints.
What is the best open source LLM safety library in 2026?: There is no single best library — the right choice depends on what you're protecting against. For runtime input/output scanning with strong PII handling: LLM Guard. For programmable topical and behavioral rails: NeMo Guardrails. For structured output validation: Guardrails AI. For prompt injection detection with canary tokens: Rebuff. For pre-deployment static analysis in CI: LLMArmor. Most production applications benefit from combining two or three of these at different positions in the request lifecycle.
Does adding a guardrail library make my LLM application secure?: Not automatically. Runtime guardrails have known bypass techniques — classifiers can be evaded with encoded inputs, heuristic filters can be bypassed with paraphrase, and input-only guardrails cannot detect indirect injection via retrieved content. A guardrail library that is not configured correctly, not tested adversarially, and not integrated at the right point in the request lifecycle may provide a false sense of security with minimal actual protection. Test your guardrails with garak or a custom payload corpus against the specific attack categories they claim to cover.
How does LLM Guard compare to Rebuff?: LLM Guard is a general-purpose scanner library with modules for prompt injection, PII, toxicity, code detection, and regex patterns — it covers a broad range of risk categories. Rebuff is narrowly focused on prompt injection detection and adds a canary token mechanism (injecting a unique token into the prompt and checking if the model reveals it in the response) that catches injection attempts classifiers miss. For a RAG application processing attacker-controlled content, Rebuff's canary approach provides complementary coverage to LLM Guard's classifier-based injection scanner.
Can I use LLMArmor as a runtime guardrail?: No. LLMArmor is a static analysis tool that runs against source code, not against live requests. It operates in CI (pre-deployment) and detects structural vulnerabilities in application code. It does not inspect runtime inputs or outputs and has zero request-path overhead. Its role is to ensure that runtime guardrails are correctly integrated in your codebase — for example, detecting that an LLM response is returned to the user without passing through an output scanner.
What license are these guardrail libraries released under?: NeMo Guardrails: Apache 2.0. Guardrails AI: Apache 2.0. LLM Guard: MIT. Rebuff: MIT. LLMArmor: MIT. All five are permissively licensed and suitable for commercial use without copyleft requirements. Verify current license status in each project's repository before production deployment, as licenses can change between versions.

LLM Security Best Practices for Production Parent pillar: comprehensive LLM security coverage for production applications.

LLM Jailbreak Detection Classifier-based detection, refusal-rate monitoring, and semantic similarity techniques.

LLM Red Teaming Adversarial testing methodology, garak, PyRIT, and example findings reports.

Detecting Prompt Injection in Production Runtime detection patterns for direct and indirect prompt injection.