Beta9 is an open-source serverless platform designed for developers running AI workloads such as LLM inference, background processing, and interactive sandboxes. It eliminates infrastructure management by providing a Pythonic API to deploy scalable GPU-powered applications with automatic scaling, containerized execution, and built-in storage. Built in Go and powered by AGPL-licensed code, it supports both self-hosting and integration with the managed Beam cloud platform.

The platform leverages a custom container runtime for sub-second image builds, integrates with CUDA-enabled GPUs (H100, 4090), and supports distributed volumes, queue-based autoscaling, and webhooks. It’s designed for developers who need to run fine-tuned models, process large datasets, or deploy inference endpoints without managing Kubernetes or VMs.

What You Get

Fast Image Builds - Launch GPU containers in under a second using a custom container runtime optimized for AI workloads
Parallelization and Concurrency - Fan out AI tasks across hundreds of containers simultaneously for high-throughput inference or batch processing
Serverless Scale-to-Zero - Workloads automatically scale to zero when idle, eliminating costs during inactivity
GPU Support - Run on cloud GPUs (H100, A10G, 4090) or bring your own GPU hardware for self-hosted deployments
Sandbox Environments - Spin up isolated, ephemeral containers to safely run LLM-generated code or experimental scripts
Background Task Queues - Replace Celery with built-in, retryable task queues that auto-scale based on job volume
Volume Storage - Mount distributed, persistent storage volumes to share data across containers or retain model outputs
Hot-Reloading and Webhooks - Develop locally with live code reloads and trigger workloads via HTTP webhooks or scheduled events

Common Use Cases

Running LLM inference at scale - A startup deploys a fine-tuned Llama 3 model as a serverless endpoint that auto-scales from 0 to 50 containers during peak API usage
Processing user-uploaded media at scale - A photo app uses Beta9 to run background image processing tasks on H100 GPUs without managing a dedicated cluster
Running experimental AI code safely - A researcher spins up a sandbox container to test untrusted LLM-generated Python code without risking their local machine
Replacing Celery with serverless background jobs - A SaaS company migrates its task queue to Beta9 to reduce infrastructure costs and eliminate Redis/worker node management

Under The Hood

Architecture

The repository exhibits a microservices-based architecture with clear separation between a Next.js frontend and backend components (gateway, worker, runner).
Containerization is central to the design, with each service packaged as a Docker image.
A Makefile streamlines build, test, and deployment processes, leveraging various tools for automation.
Protocol Buffers are used for API definition and inter-service communication.

Tech Stack

The backend is primarily built with Go, while the frontend utilizes Next.js.
Docker is heavily used for containerization, with multi-stage builds and specific Python versions managed via Micromamba.
Kubernetes orchestrates deployments, with k3d for local development and kustomize for configuration.
gRPC facilitates communication between backend services.

Code Quality

A comprehensive suite of unit and integration tests demonstrates a strong commitment to quality.
Error handling is present, though the use of custom error classes is limited.
Naming conventions are generally consistent, with some stylistic variations.
Type hints are used to enhance code readability and maintainability, but aren’t universally applied.
Serializing function results with cloudpickle is a notable pattern for remote execution.

What Makes It Unique

The project features a custom RunnerAbstraction class for parsing and validating resource requests.
Dynamic module loading via load_module_spec enables a flexible runtime environment.
Integration with LocalStack, including automated S3 bucket creation, provides a self-contained development and testing environment.
Support for GPU-accelerated workloads is indicated by the inclusion of an NVIDIA device plugin component.

Beta9

What You Get

Common Use Cases

Under The Hood

Join founders buildingwith open source

Search