LLMArmor vs garak: Static Analysis vs LLM Red-Teaming
garak is an open-source LLM vulnerability scanner developed and maintained by NVIDIA. It takes a dynamic, generative approach: it queries a running LLM with hundreds of attack probes and evaluates the responses. LLMArmor takes a static analysis approach: it scans your Python source code for security misconfigurations before your app ever runs.
Both tools address LLM security, but they answer different questions.
At a glance
Section titled “At a glance”| Dimension | LLMArmor | garak |
|---|---|---|
| Approach | Static source-code analysis | Dynamic model probing |
| When it runs | At commit / CI time (pre-deploy) | Against a running model endpoint |
| What it needs | Your Python source files | A live LLM (API key or local model) |
| Primary standards | OWASP LLM Top 10 | Extensive internal taxonomy + OWASP LLM |
| Languages supported | Python | Model-agnostic (any LLM API) |
| Cost per run | Free — zero API calls | Incurs LLM API cost per probe |
| Speed | Seconds (no network calls) | Minutes to hours depending on probes |
| Output | SARIF, JSON, Markdown, grouped terminal | JSON report, per-probe pass/fail |
| SARIF / GitHub Code Scanning | ✅ Built-in | ❌ Not natively |
| License | MIT | Apache 2.0 |
What garak does well
Section titled “What garak does well”garak is purpose-built for testing whether a specific model or system prompt is vulnerable to a wide range of jailbreaks, prompt leakage attacks, and data extraction techniques. Its probe library covers dozens of attack families — DAN variants, encoding bypasses, role-play attacks — and is continuously expanded by NVIDIA and the community.
If your threat model is “can an adversarial user manipulate my deployed chatbot into producing harmful output?”, garak is the right tool to answer that.
What LLMArmor does well
Section titled “What LLMArmor does well”LLMArmor answers a different question: “does my code have security misconfigurations that an attacker could exploit?” It finds:
- Prompt injection vectors introduced by developers (f-string interpolation of user input into system prompts)
- Hardcoded API keys accidentally committed to source
- LLM outputs passed unsanitized to
eval(),subprocess, or SQL queries - Agent tools with wildcard access or disabled approval gates
- Missing
max_tokenslimits that could lead to runaway costs
These findings exist in your code before a single request is made. garak cannot find them because they require reading the source, not querying the model.
Overlap
Section titled “Overlap”Both tools can surface prompt injection concerns, but from different angles:
- LLMArmor detects the code pattern that makes injection possible (e.g.,
f"system: {user_input}") - garak probes whether the running system actually produces harmful output when injected
Running both gives you defense-in-depth: fix the code issues LLMArmor finds, then verify your deployed system’s resilience with garak.
When to choose LLMArmor
Section titled “When to choose LLMArmor”- You want to block security issues in CI/CD before they reach production
- You need SARIF integration with GitHub Code Scanning
- You’re auditing OWASP LLM Top 10 compliance in source code
- Your team doesn’t have a running model endpoint yet
- You want fast, free scans with no API cost
When to choose garak
Section titled “When to choose garak”- You want to red-team a deployed model or system prompt
- You need coverage of jailbreak techniques and model-level attacks
- You’re evaluating whether a third-party model is safe to use in your product
- You need a broad probe library beyond OWASP LLM Top 10
Recommendation
Section titled “Recommendation”Use both. They complement each other:
- Run LLMArmor in CI on every pull request to catch code-level misconfigurations early.
- Run garak periodically against your staging or production endpoint to verify model-level resilience.
Neither tool replaces the other.