LiteLLM

Open source AI gateway and Python SDK that gives you one OpenAI-compatible interface to call 100+ LLM providers, with built-in routing, cost tracking, guardrails, and virtual keys.

52.6Kstars
9.5Kforks
MIT License
Python

LiteLLM is an open source AI gateway and Python SDK from BerriAI (Y Combinator S23) that gives engineering teams a single, OpenAI-compatible interface for calling more than 100 LLM providers, including OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Cohere, Hugging Face, and self-hosted models served through Ollama or vLLM. Instead of writing provider-specific integration code for every model a team wants to use, developers call litellm.completion() in Python, or POST to a /chat/completions-style endpoint, and LiteLLM handles the translation, so switching providers becomes a config change instead of a rewrite.

The project ships two ways to use it. The Python SDK embeds directly in application code and adds a Router with retry and fallback logic, load balancing across multiple deployments of the same model (for example several Azure regions), and cost tracking. The AI Gateway is a standalone FastAPI proxy server meant to run as shared infrastructure for a whole organization: it adds authentication, per-project and per-user virtual keys, budgets and spend tracking backed by PostgreSQL and Redis, and an admin dashboard UI, so a platform team can hand out scoped LLM access to internal teams without every team managing its own provider credentials.

Beyond routing requests, LiteLLM bundles a guardrails framework for content moderation, PII detection, and prompt-injection checks; more than 80 logging and observability integrations covering tools like Langfuse, Datadog, Arize, and MLflow; semantic and exact-match response caching backed by Redis, Qdrant, or S3; and gateways for newer agent protocols including MCP (Model Context Protocol) and A2A (Agent-to-Agent). A growing Rust core (litellm-rust, exposed via a PyO3 bridge) is being phased in underneath the Python proxy for latency-sensitive request transforms, while auth, routing, and callbacks stay in Python.

Because it is self-hosted, LiteLLM sits opposite hosted LLM routing services like OpenRouter: teams get the same “one API for many models” convenience but keep credentials, logs, and spend data inside their own infrastructure, at the cost of running and operating the proxy themselves.

What You Get

  • A Python SDK (litellm.completion() / litellm.acompletion()) that translates calls into 100+ providers’ native formats and normalizes responses back into an OpenAI-compatible shape
  • A standalone FastAPI-based AI Gateway (Proxy Server) with authentication, virtual keys, and per-project/per-user spend tracking backed by PostgreSQL and Redis
  • A Router with automatic retry, fallback, and load balancing across multiple deployments or providers (e.g. failing over from one Azure region to another)
  • A React-based admin dashboard UI for managing keys, models, budgets, and viewing usage and spend logs
  • A guardrails framework with pluggable pre/post-call hooks for content moderation, PII detection, and prompt-injection checks
  • 80+ built-in logging and observability integrations, including Langfuse, Datadog, Arize, MLflow, and Athina
  • Semantic and exact-match response caching backed by Redis, Qdrant, S3, or in-memory stores
  • MCP (Model Context Protocol) gateway and A2A (Agent-to-Agent) protocol support for connecting external tools and agents through the same proxy
  • Published Terraform modules for one-command deployment of the gateway stack on AWS (ECS Fargate + Aurora + ElastiCache) or GCP (Cloud Run + Cloud SQL + Memorystore)

Common Use Cases

  • Centralizing LLM access for a platform team so every internal app calls one gateway instead of each app managing its own provider API keys
  • Adding automatic fallback between providers or models so an outage at one provider doesn’t take down production traffic
  • Tracking and capping per-team or per-project LLM spend with virtual keys and budgets enforced at the gateway
  • Swapping or A/B testing models, for example comparing GPT-4o against Claude or Gemini, by changing a model string instead of application code
  • Adding guardrails and audit logging in front of LLM calls for compliance-sensitive industries
  • Self-hosting a unified multi-provider API instead of relying on a hosted router, to keep prompts, logs, and spend data inside a private VPC

Under The Hood

Architecture LiteLLM is split into two layered components that share one codebase: the SDK (litellm/) that owns provider-specific request/response transformation, and the AI Gateway (litellm/proxy/) that wraps the SDK with authentication, rate limiting, and management features, as documented in the project’s own ARCHITECTURE.md sequence diagrams. A request entering the proxy flows through user_api_key_auth for authentication, budget and rate-limit hooks backed by Redis, the Router for model selection and fallback, main.py’s completion/acompletion entrypoints, a shared BaseLLMHTTPHandler, and finally a per-provider ProviderConfig.transform_request/transform_response pair before cost calculation and async logging write spend data to Postgres. This consistent transform-per-provider abstraction is what lets the project support well over a hundred providers under litellm/llms/ without each one touching the routing or auth code, and it’s also the seam the project is now peeling off into a separate Rust core for performance-critical paths while keeping orchestration in Python. Extensibility runs through hook and callback registries (guardrails, logging integrations, custom loggers) rather than modifying core request handling, which keeps the blast radius of a new integration small.

Tech Stack The core library targets Python 3.10 through 3.13 and is built around Pydantic v2 for request/response models, with the optional proxy extra adding FastAPI, Starlette, and a choice of Uvicorn, Gunicorn, or Granian as the ASGI server. The gateway persists configuration, keys, teams, and spend logs in PostgreSQL through a Prisma-generated schema, and leans on Redis (including cluster and semantic-cache variants) for caching, rate limiting, and pub/sub coordination across replicas. Packaging and dependency resolution run through uv with a committed lockfile, Docker images are the primary deployment artifact, and published Terraform modules stand up the full stack (gateway, backend, admin UI, managed Postgres, Redis, and object storage) on AWS or GCP. A newer litellm-rust workspace, bridged into Python via PyO3, is being introduced incrementally for pure request-transform logic without touching networking, auth, or routing.

Code Quality The repository carries an extensive test suite that mirrors the litellm/ source tree one-for-one under tests/test_litellm/, alongside dozens of specialized test directories for proxy behavior, SCIM, guardrails, and load testing, run through pytest with async, retry, and coverage plugins. Static analysis is layered and strict: ruff for linting and formatting, basedpyright for type checking, and repository-specific “budget” files that cap the number of allowed lint and type-safety exceptions, meant to be ratcheted down over time rather than left to grow. The project’s own contributor guidelines explicitly ban silent type: ignore suppressions in favor of named, justified rule exceptions, discourage mutable-then-mutated data structures in favor of comprehensions, and describe a goal of catching regressions through mutation-testing kill rates rather than raw coverage percentage. CI runs across CircleCI and roughly fifty GitHub Actions workflows covering linting, type checking, and multi-provider test matrices.

What Makes It Unique LiteLLM’s core bet is breadth plus a consistent interface: comprehensive coverage of LLM, embedding, image, audio, and rerank endpoints across a very long tail of providers, exposed through one OpenAI-shaped API whether you use the SDK or the hosted gateway, which is what lets teams treat model choice as a runtime configuration decision instead of a code dependency. Layered on top of that base abstraction, the project has expanded into adjacent gateway concerns without changing the core contract, including a guardrails hook system, an extensive catalog of observability integrations, and gateways for emerging agent-to-agent and tool-calling protocols (A2A and MCP), all routed through the same authentication and spend-tracking path used for ordinary chat completions. The decision to progressively move performance-sensitive transform logic into a separate Rust core, while deliberately keeping auth, routing, and callbacks in Python, is a pragmatic response to running a widely-adopted low-latency proxy rather than a wholesale rewrite, and reflects a maturing, production-focused engineering posture more than a single novel technical idea.

Self-Hosting

Licensing Model LiteLLM is dual-licensed: the vast majority of the codebase (the Python SDK, the AI Gateway/proxy core, and most of litellm/) is MIT licensed with no restrictions, while everything under the top-level enterprise/ directory is covered by a separate BerriAI Enterprise license that requires a paid subscription for production use.

Self-Hosting Restrictions

  • Code under enterprise/ (SCIM v2 provisioning endpoints, some management-endpoint metadata fields, and other litellm_enterprise modules) is gated behind a runtime premium_user check; calling these endpoints without a valid LITELLM_LICENSE returns an explicit “LiteLLM Enterprise user” error.
  • The Enterprise license permits modifying and testing this code for free, but using it in production, or distributing modifications, requires a paid BerriAI Enterprise subscription.
  • The core gateway (auth, virtual keys, routing, budgets, guardrails hooks, caching, and the 80+ logging integrations) is MIT licensed and fully usable in self-hosted production deployments without a license key.

Enterprise Features

  • SCIM v2 endpoints for syncing users/teams from an external identity provider (verified in litellm/proxy/management_endpoints/scim/scim_v2.py, explicitly gated on _premium_user_check)
  • Custom Admin UI branding and logo customization (enterprise/enterprise_ui)
  • Additional enterprise-only metadata fields on teams, keys, and organizations, gated per-field through _premium_user_check in the management endpoints
  • A commercially-hosted proxy and enterprise support contract, referenced in the README as “Hosted Proxy” and “Enterprise Tier”

Cloud vs Self-Hosted BerriAI also offers a hosted version of the proxy for teams that don’t want to operate the gateway themselves, but the README and Terraform modules are built around self-hosting on your own AWS or GCP infrastructure, and the self-hosted MIT-licensed core covers essentially all day-to-day gateway functionality (routing, budgets, guardrails, caching, observability).

License Key Required Yes, but only for the specific enterprise-gated features listed above. Setting the LITELLM_LICENSE environment variable to a valid BerriAI Enterprise license unlocks SCIM provisioning and enterprise-only metadata fields; everything else in the self-hosted gateway runs without any license key.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search