Skip to content

LLMArmor vs garak: Static Analysis vs LLM Red-Teaming

garak is an open-source LLM vulnerability scanner developed and maintained by NVIDIA. It takes a dynamic, generative approach: it queries a running LLM with hundreds of attack probes and evaluates the responses. LLMArmor takes a static analysis approach: it scans your Python source code for security misconfigurations before your app ever runs.

Both tools address LLM security, but they answer different questions.

DimensionLLMArmorgarak
ApproachStatic source-code analysisDynamic model probing
When it runsAt commit / CI time (pre-deploy)Against a running model endpoint
What it needsYour Python source filesA live LLM (API key or local model)
Primary standardsOWASP LLM Top 10Extensive internal taxonomy + OWASP LLM
Languages supportedPythonModel-agnostic (any LLM API)
Cost per runFree — zero API callsIncurs LLM API cost per probe
SpeedSeconds (no network calls)Minutes to hours depending on probes
OutputSARIF, JSON, Markdown, grouped terminalJSON report, per-probe pass/fail
SARIF / GitHub Code Scanning✅ Built-in❌ Not natively
LicenseMITApache 2.0

garak is purpose-built for testing whether a specific model or system prompt is vulnerable to a wide range of jailbreaks, prompt leakage attacks, and data extraction techniques. Its probe library covers dozens of attack families — DAN variants, encoding bypasses, role-play attacks — and is continuously expanded by NVIDIA and the community.

If your threat model is “can an adversarial user manipulate my deployed chatbot into producing harmful output?”, garak is the right tool to answer that.

LLMArmor answers a different question: “does my code have security misconfigurations that an attacker could exploit?” It finds:

  • Prompt injection vectors introduced by developers (f-string interpolation of user input into system prompts)
  • Hardcoded API keys accidentally committed to source
  • LLM outputs passed unsanitized to eval(), subprocess, or SQL queries
  • Agent tools with wildcard access or disabled approval gates
  • Missing max_tokens limits that could lead to runaway costs

These findings exist in your code before a single request is made. garak cannot find them because they require reading the source, not querying the model.

Both tools can surface prompt injection concerns, but from different angles:

  • LLMArmor detects the code pattern that makes injection possible (e.g., f"system: {user_input}")
  • garak probes whether the running system actually produces harmful output when injected

Running both gives you defense-in-depth: fix the code issues LLMArmor finds, then verify your deployed system’s resilience with garak.

  • You want to block security issues in CI/CD before they reach production
  • You need SARIF integration with GitHub Code Scanning
  • You’re auditing OWASP LLM Top 10 compliance in source code
  • Your team doesn’t have a running model endpoint yet
  • You want fast, free scans with no API cost
  • You want to red-team a deployed model or system prompt
  • You need coverage of jailbreak techniques and model-level attacks
  • You’re evaluating whether a third-party model is safe to use in your product
  • You need a broad probe library beyond OWASP LLM Top 10

Use both. They complement each other:

  1. Run LLMArmor in CI on every pull request to catch code-level misconfigurations early.
  2. Run garak periodically against your staging or production endpoint to verify model-level resilience.

Neither tool replaces the other.