Beta9 is an open-source serverless platform designed for developers running AI workloads such as LLM inference, background tasks, and interactive sandboxes. It eliminates infrastructure management by providing a Pythonic API to deploy scalable GPU-powered applications with automatic scaling, zero cold starts, and built-in volume storage. Built in Go and powered by a custom container runtime, Beta9 supports both cloud-hosted and self-hosted deployments, integrating seamlessly with existing ML pipelines and offering direct access to CUDA-enabled GPUs.
The platform leverages distributed computing principles, using a combination of serverless containers, queue-based task processing, and real-time autoscaling to handle high-throughput AI workloads. It integrates with popular ML frameworks and supports deployment via a simple Python decorator system, making it ideal for teams building generative AI applications without DevOps overhead.
What You Get
- Fast Image Builds - Launch containers in under a second using a custom container runtime optimized for AI workloads
- Parallelization and Concurrency - Fan out AI inference tasks across hundreds of containers simultaneously for high-throughput processing
- Serverless GPU Inference - Deploy LLM endpoints with automatic scaling, supporting H100, A10G, and 4090 GPUs without managing infrastructure
- Sandbox Environments - Spin up isolated, ephemeral containers to run and test LLM-generated code remotely with real-time output
- Background Task Queues - Replace Celery with a Python-decorated task system that supports retry policies, input schemas, and versioned deployment
- Volume Storage - Mount distributed storage volumes to persist model weights, datasets, or output files across container instances
- Hot-Reloading & Webhooks - Develop AI apps with live code updates and trigger external workflows via HTTP webhooks on task completion
- Self-Hosting Support - Deploy Beta9 on your own infrastructure using AGPL-licensed open-source engine, with full parity to the managed Beam platform
Common Use Cases
- Running LLM inference at scale - A startup deploys a custom LLM endpoint using Beta9’s GPU autoscaling to serve 10,000+ daily API requests without managing Kubernetes clusters
- Processing user-uploaded media in the background - A photo editing app uses Beta9 task queues to resize and enhance images asynchronously with retry logic and volume storage
- Testing LLM-generated code safely - A developer spins up a sandbox to execute and validate AI-generated Python scripts in an isolated environment before production
- Replacing Celery for ML pipelines - An AI research team migrates from Celery to Beta9’s task system to run distributed fine-tuning jobs with GPU support and built-in monitoring
Under The Hood
Architecture
- Monolithic Go backend with tightly coupled services in pkg/abstractions/, lacking interface-based decoupling between Pod, Image, and Gateway components
- Protobuf-first API design enforces contract-first development but blurs boundaries between gateway routing and worker orchestration
- Docker-based microservice deployment shares a single codebase across Runner, Worker, and Gateway, with no modular separation or versioned modules
- Dependency injection is absent, with services hardcoded and instantiated directly, violating inversion of control
- Next.js frontend relies directly on backend protobuf contracts via gRPC-Web, creating an asymmetric architecture without a dedicated API gateway
- Infrastructure orchestration (K3d, Helm, Kustomize) is interwoven with application logic, leading to configuration sprawl across multiple file types
Tech Stack
- Go backend with protobuf-driven gRPC services and HTTP mapping for API generation
- Multi-stage Docker builds for isolated Python runner, worker, and gateway containers supporting multiple Python versions and micromamba environments
- Kubernetes orchestration via k3d for local development and kustomize/helm for production-grade deployment
- OpenAPI documentation auto-generated from protobuf definitions to expose gRPC services as REST endpoints
- End-to-end and load testing powered by Python-based e2e suites and k6 for performance validation
- SDK development workflow managed through uv for dependency isolation and Makefile-driven automation
Code Quality
- Extensive test coverage with parameterized tests, mocking, and context managers validating core behaviors across environments and edge cases
- Strong type safety enforced through strict type hints, range-based validation, and explicit error raising for malformed inputs
- Clean abstractions (Function, Map, Runner) delegate to client stubs, enabling testable and loosely coupled components
- Robust error handling with domain-specific exceptions and systematic validation of remote call failures and invalid configurations
- Consistent naming and modular test organization align with source structure, improving maintainability and readability
- Comprehensive mocking patterns eliminate network dependencies, ensuring fast, reliable unit tests
What Makes It Unique
- Auto-generates REST APIs from gRPC services using Google’s HTTP rule annotations, eliminating manual API layer development
- Integrates Helm and Kustomize with inline configurations and startup scripts to automate cloud-native infrastructure provisioning
- Implements AWS S3 mock via LocalStack with pre-configured buckets, enabling true local-first development without cloud dependencies
- Unifies API contracts and deployment configurations through protobuf-based service definitions and file-based secret generation
- Combines declarative infrastructure-as-code with protocol-first development to create a cohesive, self-contained deployment pipeline