Opslane

Opslane's /verify plugin turns a pasted PM ticket into a browser-driven acceptance-criteria check inside Claude Code, screenshotting and video-recording every pass or fail before you push.

115stars
3forks
MIT License
Shell

Opslane is a Claude Code plugin, distributed under the name “verify,” that adds two slash commands — /verify-setup and /verify — for self-serve QA on a pull request before it’s pushed. /verify-setup auto-detects your local dev server’s port, indexes user-facing routes and any existing Playwright or Cypress test selectors, and confirms the Playwright MCP server is wired up. /verify then reads a pasted ticket or spec file, extracts concrete testable acceptance criteria, asks about anything ambiguous, and drives a real browser — clicking, typing, and navigating like a person — against your running app via Playwright MCP.

Every acceptance criterion gets a verdict (pass, fail, blocked, unclear, error, timeout, auth_expired, or spec_unclear) plus a screenshot. Anything that doesn’t pass also keeps a full Playwright video and trace so a developer can replay exactly what the agent saw with npx playwright show-trace. Passing runs delete their recordings automatically to avoid clutter. Everything lands in a single-file report.html alongside a machine-readable verdicts.json, all inside a gitignored .verify/ directory in the user’s own repo.

The project has gone through a real architectural pivot: per its CHANGELOG, v1.1.0 (April 2026) tore out an earlier standalone CLI package and hosted SaaS auth backend in favor of this zero-install, zero-server design that relies entirely on the official Playwright MCP server. Opslane, the company behind it, went through Y Combinator’s Summer 2024 batch.

It’s an early-stage project — created in March 2026, no CI, no automated test suite, and a single maintainer-driven history so far — but development is active (dozens of commits a month) and the README’s own hard constraints are described as “battle-tested from 15+ real verification runs,” suggesting real dogfooding rather than a purely theoretical design.

What You Get

  • /verify-setup command - a one-time setup skill that auto-detects your dev server port from package.json scripts, .env* files, or framework configs, pings it, and writes .verify/config.json.
  • Automatic app indexing - walks Next.js, Remix, React Router, SvelteKit, Nuxt, and Express/Hono/Fastify route conventions plus any existing Playwright or Cypress test selectors into .verify/app.json so verification can navigate directly instead of guessing.
  • /verify command - a turn-based skill that reads a pasted ticket or discovered spec file, flags ambiguous acceptance criteria, asks clarifying questions one at a time, and only then extracts concrete testable ACs.
  • Playwright MCP-driven verification - drives your actual running app (not a mock) via the official Playwright MCP server, with zero test code to write or maintain.
  • Per-AC evidence trail - a screenshot on every acceptance criterion, plus a full video recording and Playwright trace.zip retained for anything that doesn’t pass, deleted automatically on pass.
  • Self-contained HTML report - a single-file report.html embedding verdict cards, reasoning, steps taken, and evidence, generated after every run alongside a machine-readable verdicts.json.

Common Use Cases

  • Pre-push PR sanity check - a developer finishes a feature, runs /verify, and gets a pass/fail verdict against the ticket’s acceptance criteria before opening the pull request.
  • Catching UI copy mismatches - the README’s own worked example (a “Save Draft” confirmation toast reading “Saved” instead of “Draft saved”) shows how it flags small text discrepancies a quick eyeball check would miss.
  • Teams without a dedicated QA process - solo developers or small teams with no e2e test suite get an approximation of manual ticket-based testing without hiring anyone or writing Playwright specs.
  • Debugging a failing acceptance criterion - an engineer opens .verify/runs/<run_id>/report.html or replays npx playwright show-trace .../trace.zip to see exactly what the browser agent observed.
  • Re-verifying after adding routes or auth - re-running /verify-setup after a new login flow or route changes refreshes .verify/app.json so verification doesn’t drift from the real app.

Under The Hood

Architecture Opslane’s /verify ships as a pure Claude Code plugin with no application code at all: two one-line command files (commands/verify.md, commands/verify-setup.md) each delegate to a markdown skill (skills/verify/SKILL.md, skills/verify-setup/SKILL.md) that encodes the entire control flow as a turn-based conversation the host LLM follows step by step — spec intake, pre-flight, ambiguity interpretation, clarification, then AC extraction and Playwright-driven verification. The plugin manifest (.claude-plugin/plugin.json) auto-registers the official Playwright MCP server with pinned flags so browser automation is wired up on install rather than through a separate claude mcp add step, and a single PostToolUse hook (.claude/hooks/sync-skill.sh) copies the skill files to ~/.claude/skills/ after every edit so the in-repo copy stays the single source of truth. There is no server, no database, and no compiled logic: state is just JSON files (.verify/config.json, .verify/app.json, .verify/runs/<id>/verdicts.json) written directly into the user’s working tree, and behavioral rules like the per-AC command budget or mandatory recording bookends are enforced only by the LLM reading and obeying the prose instructions, not by any code path. Changing this core abstraction means editing the SKILL.md conversation flow itself, which has no compiler or type system to catch inconsistencies between what one turn promises and what the next turn expects.

Tech Stack There is no conventional application stack to speak of — the repository has no package manifest, no Dockerfile, and no build tooling. Distribution happens through Claude Code’s own plugin marketplace mechanism rather than a package registry or container image. The one piece of standalone code, a Bash PostToolUse hook, does simple file-copying; the SKILL.md files themselves embed small Python one-liners for reading JSON config and a larger inline Python script, run via Bash at report time, that renders a single-file HTML report using the standard library. Browser automation is entirely outsourced to the alpha channel of the official Playwright MCP server, launched on demand with storage-state, isolation, and devtools capability flags baked into the plugin manifest. Per the project’s own changelog, this is a deliberate simplification from an earlier architecture that shipped a standalone CLI package and a hosted SaaS auth backend, both since removed in favor of this dependency-light, skill-only design.

Code Quality There are no test files, no source files in any typed or compiled language, and no CI configuration anywhere in the repository — confirmed by a full file listing that turns up only Markdown, a couple of JSON manifests, one image, and a short Bash script. There is consequently nothing resembling a linter, formatter, or type system enforcing consistency; the closest analogue to error handling is prose instructing the LLM to retry a failed Playwright command once and then record a timeout or error verdict and move on. The Bash hook itself has minimal error handling beyond directory creation before its copy calls. This reads less as a quality gap than a category mismatch — the project’s actual “code” is the skill prompt text — and the maintainers’ own backlog explicitly lists building eval cases for the /verify skill’s output as planned but not yet started work.

What Makes It Unique The interesting technical bet is packaging an entire product as a versioned, git-synced prompt asset rather than a compiled artifact — the plugin ships no binaries of its own and outsources all actual browser control to the general-purpose Playwright MCP server, with the LLM itself performing spec-ambiguity resolution through a dedicated interpreter turn that asks one clarifying question at a time before locking acceptance criteria, and then performing the pass/fail judgment directly. Extensive historical planning documents in the repository show this wasn’t the starting design: earlier iterations built a multi-stage pipeline with a standalone browse binary, parallel browse daemons, a database-aware test-data seeder, a deterministic route resolver, and a hosted auth server, all deliberately torn out in favor of the current single-skill-pair design once Playwright MCP became available. That pivot toward radical simplicity, and the convention of always recording video and trace evidence per acceptance criterion while discarding it on a pass, are the most novel parts; the underlying idea of an LLM-driven browser agent checking acceptance criteria against a spec is an increasingly common pattern in the broader AI-QA space, not something unique to this project.

Self-Hosting

Licensing Model MIT licensed — all functionality (both the /verify and /verify-setup skills, and the auto-wired Playwright MCP server) is available for free in any Claude Code installation, with no license keys or paid tier required.

Self-Hosting Restrictions None found — there is no server component to self-host; verification runs entirely inside the user’s own Claude Code session against their own local dev server.

Enterprise Features None found in the repository or README. The CHANGELOG confirms an earlier hosted SaaS auth backend and standalone CLI package were both removed in v1.1.0, leaving no paid or cloud-only tier.

Cloud vs Self-Hosted Not applicable — the project has no cloud offering; it is a local-only Claude Code plugin.

License Key Required No.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search