Latitude is an open-source AI engineering platform designed for teams building production-grade LLM applications. It solves the critical problem of unpredictable AI behavior by capturing real traffic, analyzing failures, and automating prompt improvements. Engineers and AI product teams use it to move beyond trial-and-error prompt tuning toward data-driven reliability.
Built in TypeScript, Latitude offers both cloud-hosted and self-hosted deployment options. It integrates with OTLP for telemetry, supports multiple LLM providers, and includes a custom PromptL language for prompt versioning. The platform’s architecture is designed around a reliability loop: observe → annotate → discover → evaluate → optimize.
What You Get
- Observability - Capture real prompts, inputs, outputs, tool calls, latency, token usage, and cost from live LLM traffic with full trace visibility.
- Prompt Playground - Reproduce runs with real inputs, iterate on prompts visually, version changes, and publish directly to the AI Gateway.
- Datasets & Testing - Curate real-world examples to create batch test suites and regression tests for prompt and model changes.
- Built-in Evaluations - Use pre-built evals, LLM-as-judge scoring, and human-in-the-loop annotations to measure quality and detect regressions.
- Issue Discovery - Automatically cluster production failures into recurring failure modes and surface root causes across users and use cases.
- Prompt Optimizer (GEPA) - Automatically test thousands of prompt variations against your eval suite using the GEPA algorithm (Agrawal et al., 2025) to reduce failures over time.
Common Use Cases
- Running production LLM chatbots - A customer support team uses Latitude to capture failed responses, annotate them with human feedback, and automatically optimize prompts to reduce misclassifications.
- Managing enterprise AI APIs - A platform team instruments their LLM gateway with Latitude to monitor token costs, detect performance drift, and enforce quality gates before deployments.
- Developing RAG systems - A research team builds datasets from real user queries, creates evals for answer accuracy, and uses GEPA to optimize retrieval prompts without manual A/B testing.
- Scaling AI-powered content generation - A media company tracks hallucinations in automated article generation, converts failures into automated evals, and uses the prompt optimizer to improve tone and factual consistency.
Under The Hood
Architecture
- Monorepo structure enforces clear separation between applications (API, ingestion, workers, workflows, web) and shared platform packages, leveraging pnpm workspaces and Turbo for modular build orchestration
- Functional dependency management via Effect.js replaces traditional containers, enabling composable, testable service layers with explicit error handling
- Data access is abstracted through Drizzle ORM with isolated adapters for PostgreSQL, ClickHouse, and Weaviate, eliminating cross-cutting concerns and enabling polyglot persistence
- Build pipeline uses Docker multi-stage builds and Turbo to ensure clean, production-optimized artifacts with minimal runtime bloat
- Test code is strictly isolated from production code via Biome lint rules, preserving clean boundaries and reducing accidental dependencies
Tech Stack
- TypeScript-based monorepo with pnpm and Turbo for efficient package management and incremental builds
- Frontend built with React 19 and Vite, paired with Hono for lightweight, type-safe API routing and server-side rendering
- Primary data stores include PostgreSQL and ClickHouse, managed via Drizzle-ORM and Drizzle-Kit for schema migrations and type-safe queries
- Temporal orchestrates complex workflows, Weaviate handles vector embeddings, and Redis powers caching and job queues via BullMQ
- Full-stack orchestration via Docker Compose with persistent volumes and environment-aware configurations for Postgres, ClickHouse, Weaviate, Redis, and Temporal UI
- Biome enforces code quality, Mise manages tool versions, and OpenCode/Agentation supports AI-assisted development
Code Quality
- Comprehensive test coverage spans unit, integration, and end-to-end scenarios using Vitest and in-memory databases to validate business logic and data flow
- Domain-driven design with clear ports and adapters ensures loose coupling between business logic and infrastructure concerns
- Functional error handling with Effect.js provides robust, explicit management of edge cases and invalid states
- Consistent, domain-focused naming conventions enhance readability and maintainability across entities, repositories, and test files
- Strong type safety is enforced through TypeScript and custom value objects, validated by exhaustive test assertions
- Advanced OpenTelemetry integration enables context-rich tracing with custom span filtering and dual-export architectures
What Makes It Unique
- Dual-layer data backend (PostgreSQL + ClickHouse) enables high-performance tracing and real-time analytics without external dependencies
- Unified component library with dynamic variants and Radix UI primitives delivers consistent, theme-aware UI across web and API layers
- First-class support for over 20 LLM providers enables seamless model switching within a single interface
- End-to-end observability stack integrates Hono, OpenTelemetry, and interactive UI widgets to debug LLM workflows in real time
- Bounded contexts enforced via monorepo domain modules allow reuse without npm bloat
- Custom JSON and trace annotation widgets provide human-in-the-loop refinement of LLM outputs—unlike generic logging tools, they enable direct behavioral iteration