How do I prevent LLMArmor from blocking PRs on a large existing codebase?

Start with --fail-on CRITICAL to only block on the highest-severity findings. Upload SARIF for all findings so HIGH and MEDIUM are visible without blocking. Create tracked issues for the HIGH backlog and tighten to --fail-on HIGH once the backlog is cleared. This phased approach lets the team adopt the gate without immediately blocking all development.

What permissions does the GitHub Actions workflow need?

The workflow needs contents: read to check out the repository and security-events: write to upload SARIF results to GitHub Code Scanning. Both are set at the job level with the permissions key. No other permissions are required — LLMArmor does not make network calls or access external services.

Can I run LLMArmor on a monorepo with multiple Python services?

Yes. Pass multiple paths: llmarmor scan ./services/chat ./services/rag ./services/agent . Or use a glob: llmarmor scan ./services to recursively scan all Python files. Use --exclude to skip test directories or generated code that should not be analyzed: llmarmor scan ./services --exclude tests,migrations,generated .

How do I add LLMArmor findings to GitHub pull request reviews?

Upload the SARIF file using the github/codeql-action/upload-sarif@v3 action with if: always() . GitHub Code Scanning automatically adds inline annotations to the pull request diff for any finding that touches a changed line. Findings on unchanged lines appear in the Security tab rather than as PR annotations.

Should I run Garak on every pull request?

No. Garak sends hundreds of adversarial prompts to a running model, which takes 15–60 minutes and incurs API costs. Running this on every PR is impractical and adds no value unless the model, system prompt, or tool configuration changed. Run Garak before major releases, when switching underlying models, or on a weekly schedule. Use LLMArmor for per-PR static coverage.

What does SARIF stand for and why does it matter?

SARIF stands for Static Analysis Results Interchange Format. It is an open JSON standard (OASIS) for representing static analysis findings. GitHub Code Scanning, GitLab SAST, and most modern CI platforms accept SARIF as input. Using SARIF output from LLMArmor means findings appear natively in the GitHub PR interface, Security tab, and alert tracking system — rather than requiring developers to read raw CLI output or a separate report file.

How do I suppress a specific finding in CI without disabling the whole rule?

Add a # noqa: RULE_ID comment to the specific line: for example, # noqa: LLM01 suppresses the prompt injection rule for that line only. Add a comment explaining why the suppression is justified — this makes the exception visible during code review. You can also use a .llmarmorignore file to exclude entire files or directories, which is appropriate for test fixtures or intentionally vulnerable example code.

CI/CD Integration: Automate LLM Security Testing

Security checks that do not run automatically do not run at all. This is true of dependency audits, true of SAST scanners, and especially true of LLM security analysis. Under deadline pressure, manual security steps are the first things dropped. When a vulnerability is found six months later in production, the explanation is usually not that the team was negligent — it is that the process depended on a human remembering to do something that no automated gate required.

Automating LLMArmor in your CI/CD pipeline takes the decision out of the equation. Every pull request that touches a Python file gets scanned. Every finding is visible in the pull request interface before merge. HIGH and CRITICAL findings block the merge until they are addressed or explicitly suppressed with a documented justification.

This post provides complete, copy-paste configurations for GitHub Actions, GitLab CI, and pre-commit hooks, along with guidance on SARIF upload, severity thresholds, and how to complement static analysis with dynamic scanning in a practical CI/CD workflow.

Why automate LLM security?

LLM-specific vulnerabilities have a property that makes shift-left analysis particularly effective: they are structural code patterns. A system prompt built by interpolating request.args values is vulnerable not because of what a user has submitted yet — it is vulnerable because of how the code is written. Static analysis can find this at the moment the code is authored, before the application handles a single request.

The same is true of hardcoded API keys, agents with unbounded iteration counts, and LLM output routed directly to eval(). These are code-level decisions, visible in the AST before the code is deployed, and automatable in CI without spinning up any infrastructure or making any API calls.

The alternative — periodic manual security reviews, pre-launch audits, “we will add security later” — consistently results in vulnerabilities that accumulate undetected and become progressively harder to remediate as the codebase grows around them.

GitHub Actions workflow

The following workflow scans every pull request that modifies Python files, fails the PR on HIGH or CRITICAL findings, and uploads results to GitHub Code Scanning as SARIF output for display in the Security tab.

name: LLM Security Scan

on:
  pull_request:
    paths:
      - "**.py"
  push:
    branches:
      - main
    paths:
      - "**.py"

jobs:
  llmarmor:
    name: LLMArmor Static Analysis
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write  # required for SARIF upload to Code Scanning

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: "pip"

      - name: Install LLMArmor
        run: pip install llmarmor

      - name: Run LLMArmor scan
        run: |
          llmarmor scan ./src \
            --output-format sarif \
            --output-file llmarmor-results.sarif \
            --fail-on HIGH
        # exit code 0 = no HIGH/CRITICAL findings
        # exit code 1 = HIGH or CRITICAL findings present → fails the job

      - name: Upload SARIF to GitHub Code Scanning
        if: always()  # upload even if scan found issues (so findings appear in PR)
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: llmarmor-results.sarif
          category: llmarmor

Key decisions in this workflow:

paths: ["**.py"] ensures the workflow only triggers when Python files change, avoiding unnecessary runs on documentation-only PRs.
--fail-on HIGH exits with code 1 when any HIGH or CRITICAL finding is present. Adjust to --fail-on CRITICAL if you want to allow HIGH findings to merge without blocking.
if: always() on the SARIF upload ensures findings are visible in the GitHub Security tab and PR annotations even when the scan job fails. Without this, a failed scan would prevent the SARIF file from being uploaded.
security-events: write permission is required for the SARIF upload action to post findings to Code Scanning.

GitLab CI snippet

For GitLab-hosted repositories, the equivalent pipeline configuration:

# .gitlab-ci.yml — add this job to your existing pipeline
llmarmor:
  stage: test
  image: python:3.11-slim
  rules:
    - changes:
        - "**/*.py"
  script:
    - pip install llmarmor --quiet
    - mkdir -p security-reports
    - |
      llmarmor scan ./src \
        --output-format sarif \
        --output-file security-reports/llmarmor.sarif \
        --fail-on HIGH
  artifacts:
    when: always
    paths:
      - security-reports/llmarmor.sarif
    reports:
      sast: security-reports/llmarmor.sarif  # integrates with GitLab SAST view
    expire_in: 30 days
  allow_failure: false

GitLab’s SAST view ingests SARIF reports from the reports: sast artifact key, displaying findings in the merge request Security tab when GitLab Ultimate is available. On lower tiers, the SARIF artifact is still downloadable for review.

Pre-commit hook

A pre-commit hook catches findings before a commit is even created, providing the earliest possible feedback. This is especially useful during active development when running CI on every commit would be too slow.

repos:
  - repo: local
    hooks:
      - id: llmarmor
        name: LLMArmor LLM Security Scan
        language: system
        entry: llmarmor
        args: ["scan", "--fail-on", "HIGH"]
        types: [python]
        pass_filenames: true
        # Runs llmarmor scan on only the staged Python files — fast per-commit check

Install the hook after adding the configuration:

pip install pre-commit llmarmor
pre-commit install

With this configuration, LLMArmor runs on the staged Python files when you run git commit. If any HIGH or CRITICAL findings are present in the files you are committing, the commit is aborted and the findings are printed. Fix the issues and commit again, or use git commit --no-verify to bypass (and accept that CI will catch it anyway).

SARIF output explained

SARIF is a JSON-based format that describes static analysis findings in a structured, tool-independent way. A SARIF file produced by LLMArmor includes:

Rules: Each rule (e.g., LLM01, LLM08) is defined once with its ID, name, default severity, and a link to the full rule documentation.
Results: Each finding lists the rule ID, a message, the file path, the line and column range, and the severity.
Tool metadata: The scanner name, version, and configuration used for the run.

When uploaded to GitHub Code Scanning, this data drives:

PR annotations: findings appear as inline comments on the exact lines in the diff where they were found, visible to reviewers without leaving the pull request.
Security alerts: persistent alerts in the repository’s Security tab, tracked across commits so you can see when a finding was introduced and when it was resolved.
Trend data: Code Scanning tracks the number of open alerts over time, useful for measuring progress on a backlog of findings.

PR-blocking thresholds

LLMArmor’s --fail-on flag controls which severity levels cause a non-zero exit code:

# Block PR on HIGH or CRITICAL findings (recommended default)
llmarmor scan ./src --fail-on HIGH

# Block only on CRITICAL findings (more permissive — allow HIGH to merge)
llmarmor scan ./src --fail-on CRITICAL

# Never block — scan in warn-only mode, upload SARIF for visibility
llmarmor scan ./src --fail-on NONE

Recommended threshold policy:

Finding Severity	Recommended Action
CRITICAL	Block PR, must be fixed or suppressed before merge
HIGH	Block PR, must be fixed or suppressed before merge
MEDIUM	Warn in CI output and SARIF, tracked as backlog issue, does not block PR
LOW	Reported in SARIF only, reviewed periodically

For teams adopting LLMArmor on an existing codebase with a large backlog of findings, use --fail-on CRITICAL initially to avoid blocking all PRs immediately. Progressively tighten to --fail-on HIGH as the HIGH backlog is resolved.

To suppress a known false positive without blocking CI, add an inline comment to the source file:

# SAFE: this value is validated against ALLOWED_ROLES before interpolation
content = f"You are a {validated_role} assistant."  # noqa: LLM01

Suppression comments should be reviewed in code review like any other exception — the reviewer should confirm that the documented justification is accurate.

Complementary automation

LLMArmor handles static code analysis. A complete LLM security automation stack combines it with two additional automated checks:

Dependency vulnerability scanning with pip-audit:

# Add to the same GitHub Actions job or a parallel job
- name: Run pip-audit
  run: |
    pip install pip-audit
    pip-audit --requirement requirements.txt --format json \
      --output pip-audit-results.json

pip-audit checks your Python dependencies against the OSV vulnerability database and flags packages with known CVEs. This covers LLM-specific supply chain risks (OWASP LLM03) that LLMArmor does not — a compromised or vulnerable version of langchain, transformers, or openai in your requirements.txt.

Scheduled Garak dynamic scans:

Do not add Garak to every PR pipeline — a full Garak probe sweep takes 15–60 minutes and produces the same results unless the model or prompt architecture changed. Instead, run it on a schedule or as a release gate:

name: Garak Dynamic Scan (Weekly)

on:
  schedule:
    - cron: "0 2 * * 1"  # Monday 02:00 UTC
  workflow_dispatch:       # also allow manual trigger

jobs:
  garak:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install Garak
        run: pip install garak
      - name: Run Garak probe sweep
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          garak --model_type openai \
                --model_name gpt-4o \
                --probes promptinject,dan,atkgen \
                --report_prefix ./security/garak_$(date +%Y%m%d)
      - name: Upload Garak report
        uses: actions/upload-artifact@v4
        with:
          name: garak-report-${{ github.run_id }}
          path: ./security/garak_*.report.jsonl
          retention-days: 90

This gives you continuous static coverage on every PR via LLMArmor, continuous dependency coverage via pip-audit, and periodic dynamic coverage via Garak — without adding 60 minutes to every developer’s pull request wait time.

For full CI/CD integration documentation including advanced configuration options, see the CI/CD Integration Guide. For an introduction to the LLMArmor rule set before setting up CI, see LLM Security in GitHub Actions.

Frequently asked questions

How do I prevent LLMArmor from blocking PRs on a large existing codebase?: Start with --fail-on CRITICAL to only block on the highest-severity findings. Upload SARIF for all findings so HIGH and MEDIUM are visible without blocking. Create tracked issues for the HIGH backlog and tighten to --fail-on HIGH once the backlog is cleared. This phased approach lets the team adopt the gate without immediately blocking all development.
What permissions does the GitHub Actions workflow need?: The workflow needs contents: read to check out the repository and security-events: write to upload SARIF results to GitHub Code Scanning. Both are set at the job level with the permissions key. No other permissions are required — LLMArmor does not make network calls or access external services.
Can I run LLMArmor on a monorepo with multiple Python services?: Yes. Pass multiple paths: llmarmor scan ./services/chat ./services/rag ./services/agent. Or use a glob: llmarmor scan ./services to recursively scan all Python files. Use --exclude to skip test directories or generated code that should not be analyzed: llmarmor scan ./services --exclude tests,migrations,generated.
How do I add LLMArmor findings to GitHub pull request reviews?: Upload the SARIF file using the github/codeql-action/upload-sarif@v3 action with if: always(). GitHub Code Scanning automatically adds inline annotations to the pull request diff for any finding that touches a changed line. Findings on unchanged lines appear in the Security tab rather than as PR annotations.
Should I run Garak on every pull request?: No. Garak sends hundreds of adversarial prompts to a running model, which takes 15–60 minutes and incurs API costs. Running this on every PR is impractical and adds no value unless the model, system prompt, or tool configuration changed. Run Garak before major releases, when switching underlying models, or on a weekly schedule. Use LLMArmor for per-PR static coverage.
What does SARIF stand for and why does it matter?: SARIF stands for Static Analysis Results Interchange Format. It is an open JSON standard (OASIS) for representing static analysis findings. GitHub Code Scanning, GitLab SAST, and most modern CI platforms accept SARIF as input. Using SARIF output from LLMArmor means findings appear natively in the GitHub PR interface, Security tab, and alert tracking system — rather than requiring developers to read raw CLI output or a separate report file.
How do I suppress a specific finding in CI without disabling the whole rule?: Add a # noqa: RULE_ID comment to the specific line: for example, # noqa: LLM01 suppresses the prompt injection rule for that line only. Add a comment explaining why the suppression is justified — this makes the exception visible during code review. You can also use a .llmarmorignore file to exclude entire files or directories, which is appropriate for test fixtures or intentionally vulnerable example code.

LLM Security Scanners Compared Full comparison of LLM security tools across categories.

CI/CD Integration Guide Full documentation for LLMArmor CI/CD configuration options.

LLM Security in GitHub Actions Detailed walkthrough of the GitHub Actions setup and Code Scanning integration.

First LLM Security Scan in 5 Minutes Install LLMArmor and run your first scan before setting up CI.