Beta9 is an open-source serverless platform designed for developers running AI workloads such as LLM inference, background processing, and interactive sandboxes. It eliminates infrastructure management by providing a Pythonic API to deploy scalable GPU-powered applications with automatic scaling, containerized execution, and built-in storage. Built in Go and powered by AGPL-licensed code, it supports both self-hosting and integration with the managed Beam cloud platform.
The platform leverages a custom container runtime for sub-second image builds, integrates with CUDA-enabled GPUs (H100, 4090), and supports distributed volumes, queue-based autoscaling, and webhooks. It’s designed for developers who need to run fine-tuned models, process large datasets, or deploy inference endpoints without managing Kubernetes or VMs.
What You Get
- Fast Image Builds - Launch GPU containers in under a second using a custom container runtime optimized for AI workloads
- Parallelization and Concurrency - Fan out AI tasks across hundreds of containers simultaneously for high-throughput inference or batch processing
- Serverless Scale-to-Zero - Workloads automatically scale to zero when idle, eliminating costs during inactivity
- GPU Support - Run on cloud GPUs (H100, A10G, 4090) or bring your own GPU hardware for self-hosted deployments
- Sandbox Environments - Spin up isolated, ephemeral containers to safely run LLM-generated code or experimental scripts
- Background Task Queues - Replace Celery with built-in, retryable task queues that auto-scale based on job volume
- Volume Storage - Mount distributed, persistent storage volumes to share data across containers or retain model outputs
- Hot-Reloading and Webhooks - Develop locally with live code reloads and trigger workloads via HTTP webhooks or scheduled events
Common Use Cases
- Running LLM inference at scale - A startup deploys a fine-tuned Llama 3 model as a serverless endpoint that auto-scales from 0 to 50 containers during peak API usage
- Processing user-uploaded media at scale - A photo app uses Beta9 to run background image processing tasks on H100 GPUs without managing a dedicated cluster
- Running experimental AI code safely - A researcher spins up a sandbox container to test untrusted LLM-generated Python code without risking their local machine
- Replacing Celery with serverless background jobs - A SaaS company migrates its task queue to Beta9 to reduce infrastructure costs and eliminate Redis/worker node management
Under The Hood
Architecture
- The repository exhibits a microservices-based architecture with clear separation between a Next.js frontend and backend components (gateway, worker, runner).
- Containerization is central to the design, with each service packaged as a Docker image.
- A Makefile streamlines build, test, and deployment processes, leveraging various tools for automation.
- Protocol Buffers are used for API definition and inter-service communication.
Tech Stack
- The backend is primarily built with Go, while the frontend utilizes Next.js.
- Docker is heavily used for containerization, with multi-stage builds and specific Python versions managed via Micromamba.
- Kubernetes orchestrates deployments, with
k3d for local development and kustomize for configuration.
- gRPC facilitates communication between backend services.
Code Quality
- A comprehensive suite of unit and integration tests demonstrates a strong commitment to quality.
- Error handling is present, though the use of custom error classes is limited.
- Naming conventions are generally consistent, with some stylistic variations.
- Type hints are used to enhance code readability and maintainability, but aren’t universally applied.
- Serializing function results with
cloudpickle is a notable pattern for remote execution.
What Makes It Unique
- The project features a custom
RunnerAbstraction class for parsing and validating resource requests.
- Dynamic module loading via
load_module_spec enables a flexible runtime environment.
- Integration with LocalStack, including automated S3 bucket creation, provides a self-contained development and testing environment.
- Support for GPU-accelerated workloads is indicated by the inclusion of an NVIDIA device plugin component.