Open-source local AI agent that operates your entire computer — GUI, browser, shell, and messaging — from a single instruction, using your own models.
Understudy is a local-first, open-source AI agent runtime for macOS that gives you a single instruction-driven interface for controlling every surface of your computer. Rather than forcing you to live inside a web app or a proprietary SaaS product, Understudy runs on your machine and dispatches tasks through GUI automation, managed Playwright browser sessions, a bash shell, semantic web search, and eight built-in messaging channel adapters — all from the same agent loop.
What distinguishes Understudy from comparable projects is its teach-by-demonstration pipeline. You show it a task once, and the system extracts the intent rather than the pixel coordinates, producing a generalized skill that survives UI redesigns, window resizing, and application switches. That skill can subsequently be invoked with plain natural language and will automatically upgrade deterministic steps (file downloads, browser navigation) to faster execution routes while keeping genuinely complex steps agentic.
For longer pipelines, Understudy introduces a three-artifact composition model: Skills are agentic sub-tasks that make their own decisions within quality gates; Workers are deterministic scripted subtasks that follow a fixed sequence and emit structured output; and Playbooks orchestrate workers and skills as independent child sessions, each with their own context window. This lets a single pipeline sequence both scripted reliability and genuine autonomy without colliding context windows.
Understudy is model-agnostic by design. It ships with adapters for OpenAI, Anthropic Claude, Google Gemini, and other providers, and requires only your own API keys — no subscription, no data sent to a vendor-managed cloud, no vendor lock-in. The runtime is published as an npm package and controlled from the command line.
Architecture
Understudy is organized as a pnpm monorepo with a clearly layered package structure: a core package owns the session orchestrator, tool registry, trust engine, system prompt builder, and workflow crystallization pipeline; a gui package encapsulates the native macOS GUI runtime with screenshot grounding; a gateway package exposes a WebSocket and REST API so external clients and messaging channels can drive sessions; and a channels package contains per-messenger adapter modules. At runtime, the orchestrator assembles tools from the registry, wraps them in a policy pipeline and watchdog, builds a system prompt from declarative parameter sections, and delegates execution to a swappable runtime adapter — either an embedded in-process adapter or an ACP protocol adapter for external runtimes. The three-artifact composition model (Skill / Worker / Playbook) is a genuine architectural boundary: playbooks spawn each stage as an independent child session with its own context window, preventing context bleed between long pipeline stages while preserving structured output contracts between them.
Tech Stack
The entire codebase is TypeScript on Node.js 20+ using ESM throughout, bundled with esbuild and type-checked with project references. The agent intelligence layer is provided by the @mariozechner/pi-agent-core and @mariozechner/pi-ai packages, which supply the provider-agnostic model abstraction and multi-provider API adapters. GUI automation uses native macOS input tooling supplemented by screenshot-based grounding. Browser automation is built on Playwright with an additional Chrome extension relay for attaching to live browser sessions. Messaging adapters use GrammY for Telegram, discord.js for Discord, Baileys for WhatsApp, and the Slack Bolt SDK; all are listed as optional dependencies so the core installs lean. Configuration and skill definitions use YAML frontmatter parsed at runtime. Testing runs on Vitest with V8 coverage.
Code Quality
The repository has extensive test coverage across unit, integration, and end-to-end layers — over 140 test files spanning CLI commands, gateway workflows, GUI grounding, scheduling, browser automation, and crystallization pipelines, with separate synthetic and live E2E modes. TypeScript strict mode is enforced via tsconfig project references. Linting uses oxlint. CI runs on GitHub Actions. Error handling is explicit throughout: policy evaluation returns typed PolicyDecision unions; the trust engine logs rate-limit violations rather than swallowing them silently; tool results pass through a context guard that detects and recovers from context overflow. Inline documentation is moderate — key abstractions like TrustEngine, RuntimePolicyPipeline, and WorkflowCrystallization have JSDoc-level comments, while lower-level utility modules are largely self-documenting through clear naming.
What Makes It Unique The most technically novel aspect of Understudy is its intent-extraction teach pipeline combined with route-optimization crystallization. When a user demonstrates a workflow, the system does not record pixel coordinates or a replay script — it uses the agent to extract the semantic intent of each step and annotate the execution route (GUI, browser automation, shell). On replay, steps are re-evaluated: if a GUI drag-and-drop can be replaced by a shell command or API call, the agent substitutes the faster route automatically. Combined with the three-artifact composition model — where a playbook can mix strictly scripted workers with genuinely agentic skills in the same pipeline — this gives teams a path to progressively harden automation without rewriting workflows from scratch. The eight-channel messaging dispatch system, treating messaging apps as a first-class trigger and notification surface rather than a bolt-on integration, is also uncommon in open-source local agent runtimes.
Understudy is released under the MIT License, which is one of the most permissive open-source licenses available. You are free to use, modify, distribute, and incorporate it into commercial products without restriction, with no copyleft requirements that would affect your own code. The only obligation is to retain the copyright and license notice in any distribution. There are no dual-licensing tiers, no license keys, and no enforcement mechanisms in the codebase.
Running Understudy yourself means taking on the full operational responsibility for the host macOS machine. The agent has broad system access by design — it can control any desktop application, run arbitrary shell commands, send messages through your personal messaging apps, and manage browser sessions. You are responsible for configuring the policy pipeline (allow/deny/require-approval rules) to match your risk tolerance, keeping Node.js and optional dependencies patched, and securing the API keys for whichever LLM provider you use. There is no persistent server process to manage beyond keeping the CLI available; sessions are ephemeral by default, though the workflow crystallization database accumulates on disk over time and is your responsibility to back up.
Because Understudy has no hosted cloud tier, there is no managed upgrade path, no SLA, and no vendor support channel beyond the community Discord and GitHub Issues. You do not get uptime monitoring, automated backups of your crystallized skills, or an operations team on call. The project is early — Layer 5 (proactive autonomy) is not yet implemented, and Layers 3 and 4 are partially complete — so expect API churn between versions. The trade-off is complete data sovereignty: no usage telemetry is sent to the Understudy project, and your conversation history and trained skills remain entirely on your local machine.
Automation · Productivity · AI Assistants
Build, deploy, and run autonomous AI agents that automate complex multi-step workflows using a visual block-based graph editor.
Devops · Automation · Security
A cloud-native reverse proxy and load balancer that auto-configures itself from Docker, Kubernetes, and other orchestrators — zero manual routing required.
Developer Tools · Automation · AI Assistants
The all-in-one AI platform for private document chat, no-code agents, and local LLMs with zero setup friction.