LLM Gateway is a middleware solution that abstracts the complexity of interacting with multiple LLM providers by offering a unified, OpenAI-compatible API. It’s designed for developers and enterprises who want to avoid vendor lock-in, reduce operational overhead from managing multiple API keys, and gain visibility into token usage, costs, and model performance. By acting as a proxy between applications and LLM providers, it enables seamless switching between models and providers without code changes.
Built with TypeScript and Hono for the API gateway, Next.js for dashboards, and Drizzle ORM with PostgreSQL and Redis for data storage, LLM Gateway supports both cloud-hosted and self-hosted deployments. The self-hosted version uses Docker with named volumes for PostgreSQL and Redis, ensuring data persistence and secure configuration. The platform is dual-licensed under AGPLv3 for open-source use and offers enterprise features for commercial teams.
What You Get
- Unified API Interface - Drop-in replacement for OpenAI’s API; change only the base URL in your existing OpenAI SDK code (Python, TypeScript, Java, etc.) to route requests to any supported LLM provider.
- Multi-provider Support - Connect to 25+ LLM providers including OpenAI, Anthropic, Google Vertex AI, and others through a single integration point with no vendor lock-in.
- Usage Analytics - Track real-time metrics: requests, tokens consumed, response latency, cost per 1K tokens, and total spend across all models and providers.
- Secure Key Management - Centralized dashboard to store, manage, and rotate API keys for all LLM providers without exposing secrets in your application code.
- Per-model/provider Breakdown - Drill down into usage and spending by specific model (e.g., gpt-4o, claude-3-opus) or provider to identify cost outliers and optimize spending.
- Performance Monitoring - Compare latency, error rates, and cache hit rates across models to select the most cost-effective and reliable model for each use case.
- LLM Guardrails - Prevent prompt injection, detect PII, and block malicious or unsafe requests before they reach LLM providers.
- Enterprise Audit Logs - Full audit trails of user actions, API key usage, and configuration changes to meet compliance and security requirements.
- Self-hosted Deployment - Run the entire gateway on your own infrastructure using Docker with named volumes for PostgreSQL and Redis, ensuring data sovereignty and control.
- Project-level Usage Explorer - Isolate and analyze request history, model usage, errors, cache performance, and costs per project or team within the dashboard.
Common Use Cases
- Running multi-provider LLM applications - A startup uses LLM Gateway to route chatbot requests between OpenAI and Anthropic based on cost and latency, switching models dynamically without code changes.
- Cost optimization for AI-powered SaaS - A developer tracks real-time token usage and cost per 1K tokens across 10+ models to identify the cheapest high-performing model for their customer support bot.
- Enterprise compliance and security - A financial services company self-hosts LLM Gateway to keep API keys internal, audit all LLM access, and enforce guardrails to prevent PII leaks.
- Migrating from OpenAI to alternative providers - A team using OpenAI’s SDK migrates to LLM Gateway in minutes by changing one base URL, then tests Claude 3 and Gemini models side-by-side without rewriting any code.
Under The Hood
Architecture
- Monorepo structure orchestrated by Turbo, enabling efficient incremental builds and shared tooling across applications, packages, and enterprise modules
- Clear separation of concerns with dedicated app modules (UI, admin, docs) and a centralized Prisma data access layer that isolates database logic from frontend code
- Dependency injection via TypeScript interfaces and package exports, allowing testable service implementations without runtime containers
- Event-driven communication through well-defined interfaces for webhooks and internal APIs, ensuring loose coupling between components
Tech Stack
- TypeScript monorepo built on Next.js 14 and Hono for server APIs, with React 18 for the frontend
- Prisma ORM with PostgreSQL and Redis, managed via Docker Compose for consistent local and production environments
- Turbo for distributed build orchestration with remote caching and parallel execution
- Vitest for comprehensive unit and E2E testing, supported by ESLint, Prettier, and automated CI/CD pipelines via Semantic Release and Husky
- Development tooling including TSC-watch, TSX, and esbuild plugins to enhance type safety and developer experience
Code Quality
- Extensive test coverage spanning unit, integration, and end-to-end scenarios, including edge cases like streaming parsing and graceful shutdown
- Modular, domain-focused directory structure with strict dependency boundaries enforced by monorepo patterns
- Robust error handling with fail-open strategies for critical systems, paired with structured logging for observability
- Consistent naming conventions and comprehensive type safety through Prisma-generated types and precise type guards
- Advanced observability via OpenTelemetry with dynamic sampling based on context, enabling production-grade tracing
What Makes It Unique
- Unified LLM orchestration layer that abstracts provider-specific complexities into a consistent API, enabling seamless model switching
- Dynamic white-labeling engine that rewrites UI components, assets, and metadata at runtime based on tenant configuration
- Integrated playground with real-time model comparison and side-by-side output visualization for user-driven benchmarking
- Auto-generated documentation from API schema annotations, eliminating manual documentation drift
- Context-aware API key management with per-model permissions and usage-based auto-rotation
- Dashboard analytics that automatically correlate cost, latency, and output quality to recommend optimal LLM routing