monty

Name: monty
Rating: 5 (7828 reviews)

Run LLM-generated Python code safely inside your agent—no containers, no CPython, no compromise—with sub-microsecond startup.

7.8Kstars

379forks

MIT License

Rust

View Source

On This Page

Monty is a minimal, secure Python interpreter written entirely in Rust, purpose-built for one job: executing Python code produced by AI agents. Instead of spinning up Docker containers, spawning CPython subprocesses, or risking direct host execution, Monty runs a curated subset of Python inside a hermetic sandbox that boots in under one microsecond and gives developers precise control over every external call the code can make.

The interpreter implements its own AST-walking execution engine on top of Ruff’s Python parser, meaning it has zero dependency on CPython or any C extension. Filesystem reads, environment variable lookups, and network calls are all routed through explicit host callbacks that developers register — everything else is blocked by default. Memory usage, stack depth, and wall-clock time can all be capped per-run with a typed ResourceTracker interface.

Monty ships as a Rust library, a Python package (pydantic-monty), and a JavaScript/TypeScript package (@pydantic/monty), each backed by the same Rust worker-pool runtime. The Python and JS bindings run Monty workers as isolated subprocesses, so even a memory-safety violation triggered by adversarial code kills only the worker — the host process stays alive and receives a MontyCrashedError. A WebAssembly build is also available for browser environments where subprocess isolation is impossible.

Built by the Pydantic team and designed to power code-mode in Pydantic AI, Monty represents a practical answer to programmatic tool calling: let the LLM write Python instead of issuing sequential JSON tool calls, execute that Python safely inside the agent loop, and handle the results with the same type-checked Python machinery you already use.

What You Get

A hermetic Python sandbox that blocks filesystem, network, and environment access by default, with all external calls routed through explicit host-provided callbacks
Sub-microsecond cold-start execution (< 1 µs from code to result) and runtime performance comparable to CPython, making it viable inside tight agent loops
Built-in type checking via the bundled ty type checker (from Astral/Ruff), enabling pre-execution validation of LLM-written code against developer-supplied type stubs
Serializable interpreter state — MontyRun and RunProgress can be dumped to bytes and restored later, enabling checkpoint/resume across database rows or message queues
Worker-pool process isolation in the Python and JS bindings so a crash in adversarial code never brings down the host process
Resource limits (memory, allocations, stack depth, execution time) configurable per run through a typed ResourceTracker trait
Multi-language bindings — use Monty from Rust, Python, JavaScript/TypeScript, or WebAssembly without any CPython dependency
A curated standard-library subset (sys, os, typing, asyncio, re, datetime, json) plus support for modern Python type hints and async/await

Common Use Cases

Programmatic tool calling in AI agents — let the LLM write Python that calls your registered functions instead of issuing sequential JSON tool-call requests
Code-mode reasoning in Pydantic AI — the interpreter will power Pydantic AI’s code-mode where the LLM orchestrates multi-step tasks as a Python script rather than a chat dialogue
Safe evaluation of LLM-generated data-transformation code — run user-described ETL logic, formula evaluation, or report formatting without exposing the host environment
Agent workflow checkpointing — serialize mid-execution interpreter state to a database and resume on a different machine or after a process restart
In-browser or edge-compute AI code execution — the WASM build runs Monty inside a browser tab or Cloudflare Worker where subprocesses are unavailable
Batch scoring and analytics over LLM-written Python scripts — run hundreds of sandboxed sessions in parallel using the built-in worker pool without per-run container spin-up cost

Under The Hood

Architecture Monty is organized as a Cargo workspace of focused crates: monty (the core interpreter), monty-pool (the worker-pool runtime), monty-proto (protobuf IPC between host and worker processes), monty-python and monty-js (PyO3 and napi-rs bindings), monty-type-checking (ty integration), and monty-typeshed (bundled type stubs). The core interpreter in crates/monty/src/ follows a clean pipeline: parse.rs calls Ruff’s parser, prepare.rs lowers the AST into an internal bytecode-like representation in bytecode/, and run.rs exposes MontyRun as the serializable entry point. Execution proceeds through a stack-based VM in bytecode/ that dispatches to individual expression, statement, and built-in handlers. Heap management lives in heap/ with stable_heap.rs and free_list.rs providing a typed arena; all object graph mutations go through a HeapReader / DropWithHeap lifetime protocol enforced at compile time via Rust’s borrow checker, eliminating entire classes of memory safety bugs without unsafe. External function calls surface as RunProgress::FunctionCall variants, pausing the VM and returning control to the host — this is the seam that both enables sandboxing and supports snapshotting.

Tech Stack The project is pure Rust (edition 2024, MSRV 1.95) with no C or CPython FFI in the core crate. Ruff’s ruff_python_parser, ruff_python_ast, and ruff_python_stdlib are pinned at a specific git revision to ensure deterministic parser output; ty_python_semantic and related crates provide the type-checking engine. Serialization uses serde + postcard for compact binary snapshots and prost for the protobuf IPC protocol between pool host and worker subprocesses. The Python binding is built with maturin + PyO3 (no CPython runtime dependency at execution time); the JS binding uses napi-rs and ships platform-specific npm packages plus a WASM sub-path. CI runs on GitHub Actions with codspeed for performance regression tracking and codecov for coverage reporting. The dev toolchain uses uv for Python dependencies and ruff + basedpyright for linting and type checking Python glue code.

Code Quality The test suite is unusually comprehensive for an experimental interpreter: 489 Python test case files in crates/monty/test_cases/ cover argument validation, arithmetic edge cases, type coercions, async/await, exception handling, and error messages — each file is a runnable Python snippet whose output is snapshot-tested with insta. CI enforces Clippy pedantic lints (with explicit allow-list exceptions documented in Cargo.toml), Ruff formatting and import sorting on all Python glue code, and strict basedpyright + mypy stubtest checks on the public Python API. Error handling is explicit throughout: the VM returns typed RunResult/MontyException values; no panics are used on normal error paths. CodSpeed tracks performance regressions on every push. The repository has abundant inline documentation, Rust doc-comments on all public items, and three worked examples (web_scraper, sql_playground, expense_analysis) demonstrating real-world integration patterns.

What Makes It Unique Monty’s central innovation is implementing a complete Python execution environment in safe Rust with zero CPython dependency — not as a transpiler or a sandboxed CPython fork, but as a purpose-built interpreter that treats every external call as an explicit suspension point. This design simultaneously achieves three goals that competing approaches cannot combine: sub-microsecond cold start (no interpreter bootstrap), hard sandbox boundaries enforced at the language implementation level (not by OS-level container primitives), and serializable mid-execution state for checkpoint/resume. The integration of ty type checking into the same binary means developers can validate LLM-generated code against their host API’s type stubs before committing to execution. The multi-language embedding story — identical behavior whether called from Rust, Python, or JavaScript, including a WASM build — is a direct consequence of the CPython-free architecture and makes Monty uniquely deployable at the edge.

Self-Hosting

Licensing Model MIT licensed — all features available in self-hosted and embedded deployments with no restrictions, license keys, or paid tiers required.

On This Page