Promptfoo

An open-source CLI and library for evaluating and red-teaming LLM applications — replace trial-and-error prompt engineering with systematic evals, vulnerability scanning, and CI/CD integration.

22.9Kstars
2Kforks
MIT License
TypeScript

Promptfoo treats LLM application quality as a testing problem rather than a guessing game: instead of manually eyeballing model outputs, you define test cases and assertions, run them across prompts/models/configurations, and get structured pass/fail results comparable in a CI pipeline the way unit tests would be.

Beyond correctness evals, Promptfoo has a dedicated red-teaming mode for probing LLM applications for vulnerabilities — prompt injection, jailbreaks, data leakage, and other adversarial failure modes — treating security testing as part of the same evaluation workflow rather than a separate concern.

The project was acquired by OpenAI but remains MIT-licensed and fully open source per the project’s own announcement, distributed as an npm package (CLI and library) with active development and a large community.

What You Get

  • A CLI and library for defining test cases and assertions against LLM outputs, run consistently across prompts and models
  • Dedicated red-teaming and vulnerability scanning for prompt injection, jailbreaks, and other adversarial LLM risks
  • CI/CD integration so prompt and model regressions are caught automatically like any other test suite
  • Side-by-side comparison of outputs across different prompts, models, or configurations

Common Use Cases

  • Regression-testing prompts and model changes in CI before deploying an LLM-powered feature
  • Red-teaming an LLM application for prompt injection and jailbreak vulnerabilities before launch
  • Comparing output quality across different models or prompt variations systematically instead of manual spot-checking
  • Building a documented, repeatable eval suite for an LLM application instead of relying on ad hoc testing

Under The Hood

Architecture Promptfoo’s src/ separates assertions (the pluggable pass/fail check logic), evaluate.ts (the core eval-running engine), codeScan (likely static analysis for vulnerability detection), commands (CLI entry points), and a database layer for storing eval results, with a separate app/ directory for a web UI on top of the CLI/library core. This split lets the same evaluation engine serve CLI users, library consumers, and a browsable results UI without duplicating the core logic.

Tech Stack TypeScript throughout, distributed as an npm package usable both as a CLI tool and as a library, with a web app component for browsing eval results. It integrates with CI/CD systems as a test-runner-style tool rather than requiring a hosted service.

Code Quality Very active, consistently maintained commit history and a large contributor/community base (per GitHub activity and Discord presence) reflect a mature, production-used tool — reinforced by its acquisition into OpenAI while remaining open source, which typically implies continued investment rather than a stalled side project.

What Makes It Unique Promptfoo treats LLM correctness evals and adversarial red-teaming as the same underlying workflow (define cases, run them, get structured results) rather than requiring separate tools for quality testing versus security testing — letting teams catch both a broken prompt and an exploitable jailbreak in the same CI step.

Self-Hosting

Licensing Model MIT licensed — the project explicitly states it “remains open source and MIT licensed” after being acquired by OpenAI.

Self-Hosting Restrictions None found for the open-source CLI/library and its eval-running functionality.

License Key Required No.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search