Helicone

An open-source AI gateway and LLM observability platform that routes requests to 100+ models while logging cost, latency, and full traces for every call.

5.9Kstars
618forks
Apache License 2.0
TypeScript

Helicone is an open-source AI Gateway and LLM observability platform built by a Y Combinator-backed team for engineers shipping production AI applications. Instead of bolting logging onto an existing integration, Helicone puts a proxy directly in the request path: point your OpenAI-compatible client at Helicone’s gateway and every call to any of 100+ models across OpenAI, Anthropic, Gemini, Bedrock, Groq, and other providers is automatically routed, logged, and priced, with intelligent fallbacks if a provider has an outage.

The project is a TypeScript monorepo split into five cooperating services: a Next.js web dashboard, a Cloudflare Worker that does the actual request proxying and rate-limiting, an Express/Tsoa API server called Jawn that ingests and serves logs, a Supabase/Postgres database for organization and auth state, and a ClickHouse cluster (with Minio for object storage) for high-volume analytics. Because observability and routing share the same code path, dashboards for cost, latency, and quality are populated from the same request that already had to be proxied, rather than from a separate log-shipping pipeline.

Beyond raw request logs, Helicone groups related calls into sessions and traces for debugging multi-step agents and chatbots, offers a Playground for iterating on prompts, and supports versioned prompt management that can be deployed through the gateway without redeploying application code. The entire stack — web, worker, Jawn, and shared packages — is Apache-2.0 licensed and can be self-hosted with a single Docker Compose command, with Kubernetes/Helm available for larger deployments.

What You Get

  • A single OpenAI-compatible baseURL that routes to 100+ models across OpenAI, Anthropic, Gemini, Bedrock, Groq, and more, with automatic fallbacks
  • Per-request cost, latency, and quality tracking sourced from the same proxy call that handles routing, not a separate async log shipper
  • Session and trace views purpose-built for debugging multi-step agent and chatbot pipelines
  • Prompt versioning and a built-in Playground for iterating on prompts and deploying changes through the gateway without a code redeploy
  • A self-hostable five-service stack (web dashboard, Cloudflare Worker gateway, Jawn API, Postgres, ClickHouse, Minio) under the Apache-2.0 license
  • An MCP server (helicone-mcp) so AI coding agents can query Helicone logs and make gateway requests directly

Common Use Cases

  • Debugging a multi-step LangGraph or CrewAI agent by inspecting recorded session traces to find which step produced a bad output
  • Capping and rate-limiting LLM spend per API key using the gateway’s Durable Object-based wallets and rate limiters
  • Failing over automatically to a backup model when a primary provider has an outage, without changing application code
  • Iterating on and deploying new prompt versions from production data without shipping a new application build
  • Self-hosting the full stack in a private VPC to keep LLM request/response payloads within a compliance boundary

Under The Hood

Architecture The system is decomposed into five independently deployable services orchestrated via Docker Compose: a Next.js web dashboard, a Cloudflare Worker that acts as the request-proxying AI gateway, an Express/Tsoa API server (Jawn) that ingests and serves logs, a Postgres database (via Supabase) for organization and auth state, and a ClickHouse cluster with object storage for high-volume analytics. Inside the worker, a Manager/Store separation of concerns is used throughout (AlertManager wraps AlertStore, ProviderKeysManager wraps ProviderKeysStore, APIKeysManager wraps APIKeysStore), and Durable Objects handle stateful concerns like rate limiting and spend tracking directly in the proxy layer. Shared business logic — cost calculation, LLM response mapping, prompt utilities, and filters — lives in internal workspace packages consumed by both the worker and Jawn without duplication, so the core abstraction is the proxy request pipeline itself: any change to how a provider request is built or logged ripples through every one of the many provider integrations. This is a solid, layered design, though the multi-service topology adds real operational complexity for anyone self-hosting the full stack.

Tech Stack The codebase is a TypeScript monorepo managed with Yarn workspaces spanning the web dashboard, worker, shared packages, and the Jawn API server. The web dashboard is built on Next.js with the App Router, Radix-UI-based components, and Supabase for auth, with product analytics wired in for the hosted product. The gateway runs as a Cloudflare Worker using Durable Objects for rate limiting and wallet balance tracking, deployed and tested via Wrangler and a Workers-specific test runner. Jawn is an Express server generated from an OpenAPI-first controller framework, using AWS SDK clients for object storage and queueing alongside a dedicated ClickHouse client for the analytics database. Deployment for self-hosters is Docker Compose, with a separate Kubernetes/Helm path offered for larger, managed deployments, and CI is split into many small, service-scoped GitHub Actions workflows rather than one monolithic pipeline.

Code Quality The repository contains a substantial number of test files split across multiple test runners appropriate to each service (a browser-style test runner for the Jest-based packages and dashboard, a Workers-specific runner for the Cloudflare proxy code), and CI enforces per-service typecheck, build, and end-to-end test workflows rather than a single combined check. TypeScript is used in strict mode throughout, with types generated from the API layer and the database schema rather than hand-maintained, and linting is enforced with ESLint and Prettier project-wide (with a newer package opting into Biome instead). Error handling favors typed Manager/Store layers over ad hoc try/catch blocks scattered through routes, though the proxy boundary code carries occasional lint suppressions for loosely-typed request handling, reflecting a pragmatic trade-off at the edge where request shapes vary by provider.

What Makes It Unique Helicone’s core bet is that observability and the LLM gateway should be the same code path rather than separate concerns: because every request already flows through the proxy, cost accounting, caching, rate limiting, and fallback routing all happen inline with no extra network hop, and the very request that gets proxied is what populates the dashboards, prompt version history, and dataset exports. Per-provider, per-model pricing is tracked as structured data rather than estimated, so cost figures are deterministic, and prompts are treated as versioned, deployable objects that can change in production without an application redeploy. This gateway-as-observability-substrate approach is a genuinely useful architectural choice compared with tools that only tail logs asynchronously after the fact, even though the broader pattern of combining a gateway with observability is shared with a handful of adjacent projects in the space.

Self-Hosting

Licensing Model Apache License 2.0 covers the entire monorepo — the web dashboard, the Cloudflare Worker gateway, the Jawn API server, and all shared packages — with no separate proprietary “enterprise edition” codebase found in the repository.

Self-Hosting Restrictions

  • The web dashboard’s UI still contains Pro-tier feature gates (Datasets, Alerts, custom rate limits, custom properties, prompt management, caching, evaluators, Playground, Sessions, Vault, Webhooks, and multi-user invites) driven by an organization.tier field read from the application database rather than an external license server.
  • Database migration history shows this tier column defaulting to enterprise for newly created organizations, which in a self-hosted install (where there is no Stripe subscription to check) means these UI gates read as satisfied by default rather than actively blocking — this is inferred from the migration files, not confirmed against a live self-hosted deployment.
  • The README explicitly discourages manual (non-Docker, non-Helm) deployment as unsupported.

Enterprise Features

  • SOC 2 and GDPR compliance claims in the README apply to the hosted Helicone Cloud offering.
  • A production-ready Helm chart for Kubernetes deployment is not published in the repository and instead requires contacting Helicone’s enterprise sales email.

Cloud vs Self-Hosted Helicone Cloud layers a managed free tier (a monthly request allowance with no credit card required) and paid Pro/Team/Enterprise plans on top of the same open-source code, plus direct access to the Kubernetes Helm chart that self-hosters must request separately from the team.

License Key Required No license-key mechanism was found in the code. Self-hosted feature availability is controlled by database state (organization.tier) rather than a remote license check, and no code path was found that calls out to a license server.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search