Skyvern
Skyvern (YC S2023) automates browser-based workflows by pairing LLMs with computer vision, letting agents click, fill, and extract data on sites they've never seen, without brittle XPath selectors that break on every layout change.
Skyvern is an open-source AI agent, born out of Y Combinator’s Summer 2023 batch, that automates browser-based workflows by combining large language models with computer vision instead of hand-coded DOM selectors. Traditional RPA scripts rely on XPath or CSS selectors that break the moment a website’s layout changes; Skyvern instead uses a swarm of agents, inspired by task-driven autonomous-agent designs like BabyAGI and AutoGPT, to visually comprehend a page and decide what to click, fill, or extract next through Playwright.
The project ships as a Playwright-compatible SDK first and foremost: page.act(), page.extract(), page.validate(), and page.prompt() add AI reasoning directly onto Playwright’s Page object, and every standard Playwright action (click, fill, select, upload) accepts an optional natural-language prompt fallback so existing automation scripts can adopt AI incrementally rather than being rewritten from scratch. On top of the SDK sits a full application — a FastAPI backend, a task/workflow engine, and a React UI — that lets both developers and non-technical users define multi-step workflows (browser tasks, data extraction, validation, loops, HTTP requests, custom code blocks) through a no-code builder.
Skyvern can be run three ways: pip install skyvern for the Python/TypeScript SDK against Skyvern Cloud or a local browser, pip install "skyvern[all]" plus skyvern quickstart for a full local server backed by SQLite by default (or Postgres via --postgres), or Docker Compose for a fully containerized Postgres+API+UI stack. The project reports SOTA results on the WebBench benchmark (64.4% overall accuracy) and claims the top score specifically on WRITE tasks — filling forms, logging in, downloading files — the RPA-adjacent workloads Skyvern is built for.
What You Get
- A Playwright-compatible Python and TypeScript SDK (
skyvern,@skyvern/client) that addspage.act(),page.extract(),page.validate(), andpage.prompt()AI commands directly onto the standard PlaywrightPageobject - A no-code workflow builder supporting browser tasks, browser actions, data extraction, validation, for-loops, file parsing, email sending, HTTP request blocks, and custom code blocks, chained into repeatable multi-step automations
- A self-hostable full stack (FastAPI backend, task/workflow engine, React UI) deployable via
pip install "skyvern[all]"with a bundled SQLite database, or via Docker Compose with Postgres for production use - Live browser-viewport streaming so you can watch and intervene in what the agent is doing in real time, useful for debugging brittle flows
- Built-in authentication handling, including TOTP-based 2FA (QR, email, and SMS codes) and password manager integrations with Bitwarden and a custom HTTP credential service
- Native Model Context Protocol (MCP) server support plus Zapier, Make.com, and n8n integrations for wiring Skyvern workflows into other tools without custom glue code
Common Use Cases
- Downloading invoices or statements from dozens of different vendor portals that each have a unique, unscripted UI
- Automating job applications by having the agent navigate application forms and submit them on a candidate’s behalf
- Filling out government forms or registering accounts on public-sector websites that lack APIs
- Automating procurement or purchasing workflows on e-commerce and B2B ordering sites, from adding items to cart through checkout
- Running one workflow definition across many structurally different websites (e.g. many vendor portals) instead of maintaining a bespoke scraper per site
Under The Hood
Architecture
Skyvern is organized into clearly separated layers documented in the maintainers’ own CLAUDE.md: an agent system (skyvern/forge/agent.py plus agent_functions.py) that runs the LLM-powered navigation loop, a public library (skyvern/library/) exposing the user-facing Skyvern class and Playwright-page wrappers, a browser engine (skyvern/webeye/) handling Playwright automation, DOM/vision scraping, and action execution, a workflow/services layer (skyvern/services/) orchestrating multi-step runs, and a FastAPI-based API layer (skyvern/forge/, with an internal SDK under forge/sdk/ for DB access, routes, and the executor). Data flows from a user-created task or workflow, through the agent’s LLM analysis of screenshots and DOM state, into Playwright-driven browser actions, with results captured and persisted before the workflow orchestrator advances to the next step. The core agent loop file is large and clearly carries a lot of the system’s decision-making logic in one place, which is a natural pressure point in an otherwise modular, directory-separated codebase.
Tech Stack
The backend is Python 3.11-3.13 built on FastAPI, SQLAlchemy with Alembic migrations, and Playwright for browser control, with uv and a Hatchling build backend managing dependencies split into local, server, and cloud dependency groups. LLM access is abstracted through LiteLLM so Skyvern can target OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Gemini, or Ollama, and it ships an MCP server via fastmcp. Persistence defaults to SQLite for the pip-installed quickstart path and Postgres for the Docker Compose stack; the cloud-only dependency group layers in Temporal for workflow orchestration, Redis, Stripe billing, and OpenTelemetry instrumentation. The frontend is a separate skyvern-frontend React application built and linted independently with npm.
Code Quality
The repository includes an extensive test suite — hundreds of test files spread across tests/unit, tests/smoke_tests, and tests/sdk, using pytest with async support and class-based test organization with descriptive method names. Errors flow through a typed SkyvernException hierarchy rather than bare exceptions. The project enforces mypy type checking (with the SQLAlchemy mypy plugin), Ruff for linting and formatting, isort for import ordering, and pre-commit hooks, all wired into a GitHub Actions CI pipeline that spins up a real Postgres service container and runs both the Python test suite and the frontend’s own npm-based checks.
What Makes It Unique Skyvern’s core bet is replacing brittle, selector-based automation with vision-and-LLM-driven interaction, but it doesn’t force an all-or-nothing switch: its SDK layers AI directly onto Playwright, supporting pure-selector, pure-natural-language, and selector-with-AI-fallback modes on the same page object, so an existing Playwright test or script can adopt AI incrementally. Combined with a reported state-of-the-art result on the WebBench benchmark, and a specific claim of leading performance on WRITE-style tasks (form filling, login, downloads) rather than just read/navigation tasks, this positions Skyvern closer to a hybrid automation toolkit than a pure autonomous-agent demo.
Self-Hosting
Licensing Model Skyvern is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0), a copyleft license. Unlike MIT or Apache-2.0, AGPL requires that if you modify Skyvern and offer it as a network service, you must make your modified source available to users of that service — worth factoring in before building a closed-source SaaS on top of a modified fork.
Self-Hosting Restrictions
No ee/, enterprise/, or pro/ directories, and no license_check/isEnterprise/FEATURE_FLAGS-style gating were found anywhere in the source tree. All SDK commands, the workflow engine, and the self-hosted server appear to be the same code that powers the cloud product.
Enterprise Features No separate paid on-prem tier is documented; the README’s advanced features (workflows, 2FA, MCP, Zapier/Make/n8n integrations) are presented as available regardless of deployment mode.
Cloud vs Self-Hosted Skyvern Cloud is a managed hosted version, and per the README it specifically bundles anti-bot detection mechanisms, a proxy network, and CAPTCHA solvers alongside the ability to run multiple Skyvern instances in parallel — infrastructure-heavy capabilities that are meaningfully harder to replicate in a self-hosted deployment.
License Key Required No. No license-key mechanism was found in the codebase for either the SDK or the self-hosted server.
Related Apps
n8n
Automation · No Code Platforms
Code when you need it, UI when you don't — the workflow automation platform built for technical teams who refuse to choose.
n8n
Otherclaw-code
AI Agents · AI Code Assistants
A Rust-built CLI agent harness for Claude AI with persistent sessions, MCP tool integration, plugin hooks, and multi-provider support — designed to run autonomous coding workflows without human babysitting.
claw-code
MITAutoGPT
Automation · Productivity · AI Assistants
Build, deploy, and run autonomous AI agents that automate complex multi-step workflows using a visual block-based graph editor.