OpenSandbox

Name: OpenSandbox
Rating: 5 (11819 reviews)

Secure, fast, and extensible sandbox runtime for AI agents with multi-language SDKs and Docker/Kubernetes runtimes.

11.8Kstars

987forks

Apache License 2.0

Python

View Source Visit Website

On This Page

OpenSandbox is a general-purpose sandbox platform designed for AI applications that need safe, reproducible execution environments. Originally built by Alibaba Group, it provides a unified control plane for creating and managing sandboxes across Docker and Kubernetes backends, making it suitable for everything from local development to large-scale distributed AI agent deployments.

The platform ships multi-language SDKs in Python, Java/Kotlin, JavaScript/TypeScript, C#/.NET, and Go — all implementing the same Sandbox Protocol, which defines lifecycle management and execution APIs in a vendor-neutral way. This means teams can write agent code once and run it against any conforming runtime implementation. The built-in MCP server further allows MCP-capable clients like Claude Code and Cursor to create and operate sandboxes without custom integration code.

OpenSandbox focuses on security through layered isolation: it supports gVisor, Kata Containers, and Firecracker microVM as secure container runtimes, an egress proxy with per-sandbox network policy enforcement, a DNS proxy for outbound traffic control, and a credential vault that injects secrets into workloads without exposing them directly. These features address a core challenge in AI agent deployment — you need fast iteration but cannot allow untrusted code to reach arbitrary infrastructure.

The project is listed in the CNCF Landscape and holds an OpenSSF Best Practices badge, signaling production readiness. It covers broad AI scenarios: coding agents (with examples for Claude Code, Gemini CLI, OpenAI Codex, Qwen), browser automation (Chrome, Playwright), desktop environments via VNC, and reinforcement learning training workloads — all with a consistent SDK and API surface.

What You Get

Multi-language SDKs — First-class clients in Python, Java/Kotlin, JavaScript/TypeScript, C#/.NET, and Go, all sharing the same Sandbox Protocol so code is portable across runtimes
Docker and Kubernetes runtimes — Run sandboxes locally with Docker for development and switch to the Kubernetes controller for distributed, large-scale scheduling without changing application code
Secure container runtime support — Built-in integration with gVisor, Kata Containers, and Firecracker microVM for hardware-level isolation between sandbox workloads and the host
Egress network control — Per-sandbox outbound network policy enforcement via a built-in egress proxy with DNS interception and nftables-based traffic filtering
Credential vault — Secure secret injection that passes credentials to sandbox workloads at runtime without embedding them in images or environment variables visible to the workload itself
Code Interpreter SDK — A dedicated SDK for multi-language code execution (Python, etc.) built on top of the sandbox layer with session-based REPL-style evaluation
MCP server — An out-of-the-box MCP server that exposes sandbox creation, command execution, and file operations to MCP-capable agents like Claude Code and Cursor
CLI tool (osb) — A terminal CLI for the full sandbox workflow: create, run commands, transfer files, inspect diagnostics, and manage egress policy from the shell
Kubernetes Operator — A custom controller and task executor for scheduling sandbox workloads as Kubernetes custom resources with Helm chart support

Common Use Cases

AI coding agents — Run LLM-driven coding CLIs (Claude Code, Gemini CLI, OpenAI Codex) inside isolated sandboxes so generated code cannot escape to the host environment
Code execution services — Build user-facing code playgrounds or notebook backends where untrusted Python, JavaScript, or other code runs safely in containerized interpreters
Agent evaluation harnesses — Spin up sandboxes programmatically for each evaluation run, capture outputs, and tear down cleanly — eliminating state pollution between test cases
Browser automation at scale — Launch Chromium or Playwright sandboxes with VNC and DevTools access for AI-driven web scraping, testing, or GUI agent tasks
Reinforcement learning training — Run RL training workloads (e.g., DQN, policy gradient) in isolated sandboxes with checkpoint support to prevent training jobs from interfering with each other
Multi-agent orchestration — Integrate with LangGraph or Google ADK to give each agent its own sandboxed execution environment, connected via the sandbox API rather than shared compute

Under The Hood

Architecture OpenSandbox follows a layered, protocol-first architecture where a vendor-neutral Sandbox Protocol specification sits above pluggable runtime implementations. The FastAPI-based lifecycle server acts as the central control plane, routing requests through a thin API layer into an abstract SandboxService interface, behind which Docker and Kubernetes implementations are swapped via a factory pattern at startup. Orthogonal concerns — egress network policy, ingress proxying, credential injection, and execution (execd) — are deployed as independent sidecar components rather than baked into the server, which keeps the control plane stateless and the runtime surface composable. The Kubernetes path adds a custom operator with a task executor that schedules sandboxes as Kubernetes custom resources, enabling horizontal scale without changing the API contract. This separation of lifecycle control, execution, and network concerns at the component boundary reflects deliberate design for operational flexibility.

Tech Stack The lifecycle server is Python 3.10+ on FastAPI with Uvicorn, using Pydantic v2 for schema validation and a TOML-based configuration model. Redis is used for sandbox lease management and sandbox pool state. The egress component and Kubernetes operator are written in Go, using mitmproxy for HTTPS interception, nftables for traffic filtering, and a custom DNS proxy for outbound name resolution control. The execd daemon (also Go) handles command execution and file I/O inside sandboxes over WebSocket. The SDK layer ships first-class async clients in Python, TypeScript/JavaScript (npm), Java/Kotlin (Maven/Gradle), C#/.NET (NuGet), and Go — all implementing the same OpenAPI-specified Sandbox Protocol. Infrastructure deployment uses Docker Compose for local setups and Helm charts for Kubernetes, with OpenTelemetry OTLP export for observability.

Code Quality The server codebase has extensive unit and integration test coverage, with over 40 test modules in the server package alone covering auth middleware, Docker and Kubernetes service implementations, snapshot lifecycle, pool behavior, route contracts, and Redis integration. The project uses Ruff for linting and Pyright in standard mode for type checking, with full type annotations throughout the Python codebase. Go components have their own test suites including table-driven unit tests and real E2E tests in CI (real-e2e.yml and kubernetes-nightly-build.yml workflows). Error handling is explicit throughout — HTTP errors normalize to a structured {code, message} schema, service-layer errors use typed error codes, and startup failures result in hard exits with clear log messages rather than silent degradation. The OpenSSF Best Practices badge indicates the project meets community standards for documentation, testing, and vulnerability disclosure.

What Makes It Unique OpenSandbox’s most distinctive technical choice is the combination of a language-neutral Sandbox Protocol with a security stack that operates below the application layer — the credential vault injects secrets at the egress proxy layer without the workload ever seeing them, and network policies are enforced via kernel-level mechanisms (nftables, DNS interception) rather than application-level allow-lists that workloads could bypass. Unlike general container orchestration tools, OpenSandbox is specifically designed for the AI agent execution pattern: sandboxes are ephemeral and short-lived by design, the MCP server integration makes sandboxes directly consumable by LLM agents without custom tooling, and the sandbox pool pre-warming addresses the cold-start problem that matters acutely when an agent may spin up dozens of sandboxes per session. The breadth of supported agent frameworks (LangGraph, Google ADK, multiple coding CLIs) and environments (desktop VNC, browser, code interpreter) with a single unified API is unusual in this space.

Self-Hosting

OpenSandbox is released under the Apache License 2.0, a permissive open-source license that allows unrestricted commercial use, modification, distribution, and sublicensing. There are no copyleft obligations, meaning you can embed OpenSandbox in proprietary products or services without being required to open-source your own code. The entire feature set — including Kubernetes runtime, credential vault, secure container runtimes, and network policy — is available in the open-source repository with no feature gating or paid tiers.

Self-hosting OpenSandbox requires a working Docker installation for local development or a Kubernetes cluster for production deployments. The server component is a Python FastAPI application that needs Python 3.10+ and optionally Redis for sandbox lease management and pool coordination. For Kubernetes deployments, you install the controller and task executor via provided Helm charts and configure secure container runtime integrations (gVisor, Kata, Firecracker) at the node level — this is non-trivial and requires kernel-level setup and cluster administrator access. Your team is responsible for server uptime, certificate management, API key rotation, updating to new releases, and monitoring sandbox resource consumption.

Since OpenSandbox is a pure open-source project with no commercial cloud offering from Alibaba at the time of writing, there is no managed SaaS tier to compare against. Support is community-driven via GitHub issues and a DingTalk group. You gain full control over data residency, network topology, and runtime security policy, but you also absorb all operational overhead — there are no SLAs, no managed upgrades, no cloud backups, and no 24/7 support contracts unless you arrange them yourself through a third-party hosting provider.

On This Page