LLMArmor vs garak: Static Analysis vs LLM Red-Teaming

garak is an open-source LLM vulnerability scanner developed and maintained by NVIDIA. It takes a dynamic, generative approach: it queries a running LLM with hundreds of attack probes and evaluates the responses. LLMArmor takes a static analysis approach: it scans your Python source code for security misconfigurations before your app ever runs.

Both tools address LLM security, but they answer different questions.

At a glance

Dimension	LLMArmor	garak
Approach	Static source-code analysis	Dynamic model probing
When it runs	At commit / CI time (pre-deploy)	Against a running model endpoint
What it needs	Your Python source files	A live LLM (API key or local model)
Primary standards	OWASP LLM Top 10	Extensive internal taxonomy + OWASP LLM
Languages supported	Python	Model-agnostic (any LLM API)
Cost per run	Free — zero API calls	Incurs LLM API cost per probe
Speed	Seconds (no network calls)	Minutes to hours depending on probes
Output	SARIF, JSON, Markdown, grouped terminal	JSON report, per-probe pass/fail
SARIF / GitHub Code Scanning	✅ Built-in	❌ Not natively
License	MIT	Apache 2.0

What garak does well

garak is purpose-built for testing whether a specific model or system prompt is vulnerable to a wide range of jailbreaks, prompt leakage attacks, and data extraction techniques. Its probe library covers dozens of attack families — DAN variants, encoding bypasses, role-play attacks — and is continuously expanded by NVIDIA and the community.

If your threat model is “can an adversarial user manipulate my deployed chatbot into producing harmful output?”, garak is the right tool to answer that.

What LLMArmor does well

LLMArmor answers a different question: “does my code have security misconfigurations that an attacker could exploit?” It finds:

Prompt injection vectors introduced by developers (f-string interpolation of user input into system prompts)
Hardcoded API keys accidentally committed to source
LLM outputs passed unsanitized to eval(), subprocess, or SQL queries
Agent tools with wildcard access or disabled approval gates
Missing max_tokens limits that could lead to runaway costs

These findings exist in your code before a single request is made. garak cannot find them because they require reading the source, not querying the model.

Overlap

Both tools can surface prompt injection concerns, but from different angles:

LLMArmor detects the code pattern that makes injection possible (e.g., f"system: {user_input}")
garak probes whether the running system actually produces harmful output when injected

Running both gives you defense-in-depth: fix the code issues LLMArmor finds, then verify your deployed system’s resilience with garak.

When to choose LLMArmor

You want to block security issues in CI/CD before they reach production
You need SARIF integration with GitHub Code Scanning
You’re auditing OWASP LLM Top 10 compliance in source code
Your team doesn’t have a running model endpoint yet
You want fast, free scans with no API cost

When to choose garak

You want to red-team a deployed model or system prompt
You need coverage of jailbreak techniques and model-level attacks
You’re evaluating whether a third-party model is safe to use in your product
You need a broad probe library beyond OWASP LLM Top 10

Recommendation

Use both. They complement each other:

Run LLMArmor in CI on every pull request to catch code-level misconfigurations early.
Run garak periodically against your staging or production endpoint to verify model-level resilience.

Neither tool replaces the other.

Get started with LLMArmor Install LLMArmor from PyPI and run your first scan in under a minute.