Skip to content

Free LLM Security Scanners for Startups

Startups shipping LLM features move fast by necessity. Security review is expensive, time-consuming, and easy to defer — especially when the team has no dedicated security engineer, the product is pre-launch, or the user base is still small enough that risk feels theoretical. The problem is that LLM-specific vulnerabilities compound over time. Code patterns that accept tainted user input in system prompts, agents with wildcard tool access, and API keys stored in source code all become harder to remediate as the application grows. A startup that defers LLM security until Series A will be fixing technical debt under pressure instead of shipping features.

The good news: a reasonable LLM security baseline costs nothing in licensing fees. Three open-source tools — LLMArmor for static analysis, Garak for pre-release dynamic scanning, and Rebuff for runtime detection — cover the primary attack surfaces of a public-facing LLM application. This post walks through how to set up all three and what to do with their output.

The threat model for a startup LLM application

Section titled “The threat model for a startup LLM application”

Before choosing tools, it helps to be precise about what you are actually defending against.

A typical startup LLM application exposes a public-facing API or chat interface. Users submit arbitrary text. That text is incorporated into prompts, sometimes alongside retrieved documents from a vector database, and sent to an LLM API. Responses are returned to users or used to drive agent actions. The team is small — often no dedicated security person — and the codebase is moving fast.

The primary risks in this configuration are:

Prompt injection (OWASP LLM01): Users submit text that manipulates the LLM’s behavior beyond the intended scope — causing it to reveal system prompts, bypass content policies, or act on behalf of the attacker rather than the user.

Indirect prompt injection via RAG: If the application retrieves documents and includes them in LLM context, an attacker who can write to the document store (or whose content gets indexed from the web) can embed instructions that hijack the model’s behavior.

Credential exposure (OWASP LLM02): API keys hardcoded in source code or committed to version control — a common early-stage shortcut — are scraped by automated tools within hours of a public push.

Unbounded inference cost (OWASP LLM10): No max_tokens limit means a single malicious or accidental request can exhaust a free-tier API quota or run up a significant bill.

Excessive agent permissions (OWASP LLM08): Agents built with wildcard tool access during rapid prototyping are never scoped back down — a tool that was added for debugging becomes a permanent attack surface.

Tool 1: LLMArmor — static analysis in CI

Section titled “Tool 1: LLMArmor — static analysis in CI”

LLMArmor scans Python source code at commit time. It finds the structural code patterns that produce LLM vulnerabilities: tainted user input in system prompts, hardcoded credentials, missing max_tokens, agents with over-broad tool access.

Installation:

Terminal window
pip install llmarmor

One-off scan of your project:

Terminal window
llmarmor scan ./src

GitHub Actions — run on every pull request:

.github/workflows/llm-security.yml
name: LLM Security Scan
on:
pull_request:
paths:
- "**.py"
jobs:
llmarmor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install llmarmor
- run: llmarmor scan ./src --fail-on HIGH

This workflow runs on any pull request that touches a Python file and fails the PR if any HIGH or CRITICAL findings are present.

What to do with the output:

LLMArmor findings include a severity (CRITICAL, HIGH, MEDIUM, LOW), a rule ID (e.g., LLM01, LLM02), a file path and line number, and a remediation note. For a first scan:

  1. Fix all CRITICAL findings before the next deployment.
  2. Fix HIGH findings within the current sprint.
  3. Create tracked issues for MEDIUM findings and address them in the next iteration.
  4. Review LOW findings for context; many can be suppressed if the pattern is intentional and documented.

Limitations: LLMArmor currently analyzes Python only. If your application uses Node.js, Go, or another language for the LLM integration layer, LLMArmor will not scan those files. It also cannot detect runtime behavioral vulnerabilities — issues that depend on the model’s response to specific inputs rather than the structure of the code.

Tool 2: Garak — dynamic scanning before major releases

Section titled “Tool 2: Garak — dynamic scanning before major releases”

Garak (NVIDIA, open source) sends adversarial prompts to a running model and checks whether the responses violate expected safety properties. It finds model behavioral vulnerabilities that static analysis cannot see: jailbreak susceptibility, system prompt leakage, toxic content generation.

Installation:

Terminal window
pip install garak

Scan your OpenAI-backed application:

Terminal window
# Run before each major release or when switching underlying models
garak --model_type openai \
--model_name gpt-4o \
--probes promptinject,dan,atkgen \
--report_prefix ./security/garak_prescan

Scan a local or self-hosted model:

Terminal window
garak --model_type huggingface \
--model_name mistralai/Mistral-7B-Instruct-v0.2 \
--probes promptinject,continuation \
--report_prefix ./security/garak_local

Garak outputs a JSON results file and a human-readable summary. Each row in the summary shows a probe name, the number of probes sent, and a pass rate. A pass rate below 100% means some fraction of probes produced an unsafe response.

What to do with the output:

A Garak finding does not mean you have a zero-day — it means the model responded to some adversarial prompts in a way that failed the probe’s safety check. Interpret results in context:

  • If promptinject has a low pass rate, audit your system prompt design: add explicit instruction boundaries, consider a two-tier prompt architecture, or add an output classifier.
  • If dan (Do Anything Now jailbreak family) shows failures, evaluate whether your application’s use case makes those failure modes exploitable by real users.
  • If realtoxicityprompts shows failures, add output filtering for your specific content categories.

Scheduling: Garak is too slow for every pull request (a targeted scan takes 15–30 minutes). Run it before each major release, when you change the underlying model, or when you significantly expand what the agent can do.

Limitations: Garak may produce false positives — probes that trigger a “failure” in a way that is not actually exploitable in your specific application. Review failures manually before treating them all as blocking issues. Garak also does not read your application code, so it cannot find structural vulnerabilities like hardcoded API keys.

Tool 3: Rebuff — runtime injection detection

Section titled “Tool 3: Rebuff — runtime injection detection”

Rebuff (open source) is a lightweight middleware library that classifies incoming user prompts for prompt injection attempts before they reach the LLM. It uses a combination of heuristics and a small classifier model to score each input, allowing you to reject or flag suspicious requests at runtime.

Installation:

Terminal window
pip install rebuff

Integration in a Flask API:

from flask import Flask, request, jsonify, abort
from rebuff import Rebuff
import openai
app = Flask(__name__)
client = openai.OpenAI()
# Initialize Rebuff with the API key from environment — do not hardcode
rb = Rebuff(api_token=None, api_url="https://playground.rebuff.ai")
SYSTEM_PROMPT = "You are a helpful customer support assistant." # SAFE: static prompt
@app.route("/chat", methods=["POST"])
def chat():
user_input = request.json.get("message", "")
# SAFE: check for injection attempt before sending to LLM
detect_response = rb.detect_injection(user_input)
if detect_response.injection_detected:
# Log the attempt for review, then reject it
app.logger.warning(
"Injection attempt detected",
extra={
"user_input_hash": hash(user_input),
"heuristic_score": detect_response.heuristic_score,
},
)
abort(400, description="Request rejected.")
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_input}, # SAFE: static system prompt
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=512, # SAFE: bounded token usage
)
return jsonify({"reply": response.choices[0].message.content})

What to do with the output:

Rebuff returns a detection score and a boolean injection_detected flag. In production:

  • Log every flagged request with a hash of the input (not the raw input, to avoid logging adversarial content at scale).
  • Review flagged requests periodically to calibrate the threshold — Rebuff can produce false positives on legitimate inputs that use instruction-like language.
  • Use the scored output to build a rate-limiting or progressive trust system rather than a hard reject if your user population has high false-positive rates.

Limitations: Rebuff adds latency to every request — typically 50–200ms depending on network conditions and the classifier model used. For latency-sensitive applications, consider running detection asynchronously and using a flag for post-hoc review rather than a hard block. Rebuff also cannot detect indirect injection from retrieved documents; it only inspects the direct user input.

Run LLMArmor against your codebase today and triage findings in this order:

Critical — fix before next deployment:

  • Hardcoded API keys or credentials in source code (LLM02)
  • User-controlled input directly interpolated into role: system messages (LLM01)
  • Agent tool access that includes shell execution, arbitrary file writes, or unrestricted network calls without human approval (LLM08)

High — fix within current sprint:

  • Missing max_tokens on any LLM API call exposed to user input
  • Retrieved document content injected into system context without sanitization
  • max_iterations not set or set above 15 on agent executors

Medium — track and address next iteration:

  • Dynamic system prompt values that pass through an allowlist but are not validated for injection patterns
  • Tool descriptions that include sensitive information about internal systems
  • Missing output validation before LLM responses reach rendered HTML

Low — review and document:

  • Verbose logging that may capture user inputs containing PII
  • Debug-mode settings left enabled that expose internal model parameters
Is there a truly free LLM security scanner?
Yes. LLMArmor, Garak, and Rebuff are all open source with no licensing fees. LLMArmor and Garak are Apache 2.0; Rebuff is open source on GitHub. The only costs are compute time (negligible for LLMArmor), API call costs if you run Garak against a paid model endpoint, and latency overhead for Rebuff in production. For CI scanning and pre-release checks, the total monetary cost is effectively zero.
Do I need a security engineer to use these tools?
LLMArmor and Garak can be run by a developer without security background — both produce human-readable findings with remediation notes. PyRIT and more advanced Garak configurations require more security expertise. For a startup without a dedicated security engineer, starting with LLMArmor in CI plus Garak before releases provides significant coverage without requiring specialized knowledge.
How do I prioritize findings when I have hundreds of issues?
Fix by severity: CRITICAL first (hardcoded credentials, direct taint to system role), then HIGH (missing token limits, wildcard agent tools), then MEDIUM. Within a severity tier, prioritize findings in code that is already deployed or about to be deployed over findings in internal tooling or developer scripts. Do not try to fix everything at once — start with the issues that represent the highest realistic risk of exploitation.
Can LLMArmor scan JavaScript or TypeScript?
Not currently. LLMArmor analyzes Python source code only. If your LLM integration layer is written in JavaScript or TypeScript, LLMArmor will not scan those files. For non-Python stacks, Garak (model-agnostic, runtime) still applies, and Rebuff can be integrated as an API call from any language.
How do I handle Rebuff false positives?
Rebuff's sensitivity can produce false positives on legitimate inputs that use directive language — customer support queries like 'tell me how to reset my password' can trigger heuristics designed to detect instruction injection. Tune the detection threshold for your use case, implement a soft-block pattern (flag and log rather than hard reject), and review flagged requests weekly to identify patterns. You can also run Rebuff in monitoring-only mode initially to collect baseline data before enabling blocking.
Should I run Garak on every deploy?
For most startups, running Garak before each significant release — major feature additions, model upgrades, new tool or agent capabilities — is a practical cadence. Running it on every deploy adds 15–60 minutes to your pipeline and produces the same results unless the model or prompt architecture changed. Reserve frequent Garak runs for high-risk changes, and use LLMArmor for routine per-commit coverage.
What is the biggest LLM security mistake startups make?
Hardcoding API keys in source code and committing them to a public or semi-public repository. Automated tools scrape GitHub, GitLab, and npm for exposed credentials within minutes of a push. Rotate any exposed key immediately, add it to a secrets manager (AWS Secrets Manager, HashiCorp Vault, or even GitHub Actions secrets for CI), and add a pre-commit hook or LLMArmor CI check to prevent future occurrences.