Laminar is an open-source observability platform designed for monitoring, debugging, and evaluating AI agents. It targets developers building complex, long-running AI agents using frameworks like LangChain, OpenAI, Anthropic, and Browser Use, providing end-to-end visibility into agent behavior that traditional logging cannot capture. By combining real-time tracing, AI-powered diagnostics, and SQL-based analytics, Laminar solves the critical problem of opaque agent failures and slow iteration cycles in AI development.
Built with Rust for high performance and powered by OpenTelemetry, Laminar supports both managed and self-hosted deployments via Docker Compose. It integrates natively with TypeScript and Python SDKs, offers gRPC exporters, and includes a full-stack UI with a SQL editor, dashboard builder, and browser session replay — all designed to work with popular AI frameworks and browser automation tools like Playwright and Stagehand.
What You Get
- OpenTelemetry-native Tracing - Automatically trace LLM calls from OpenAI, Anthropic, Gemini, LangChain, Vercel AI SDK, and Browser Use with a single line of code using the Laminar SDK.
- AI-Powered Debugger - Debug agent failures with session replay, step-by-step trace visualization, and AI-generated root cause analysis of complex trace chains.
- Signals for AI Monitoring - Define custom events in natural language (e.g., ‘agent failed to extract pricing data’) and automatically detect them across millions of traces with structured output schemas.
- SQL Access to All Data - Query traces, metrics, and events using a built-in SQL editor; bulk-export datasets for evals or analysis via API.
- Evals SDK and CLI - Run unopinionated, extensible evaluations locally or in CI/CD pipelines with a unified UI to compare results across models and prompts.
- Browser Session Replay - Automatically sync browser screen recordings (from Browser Use, Stagehand, Playwright) with agent traces to visualize user-facing agent behavior.
- Dashboard Builder - Create custom dashboards combining traces, metrics, and SQL queries to monitor agent health, performance, and error rates over time.
- Data Annotation & Dataset Creation - Annotate trace data with custom labels and export labeled datasets directly for training or evaluating AI models.
Common Use Cases
- Debugging agent failures in production - A developer uses Laminar’s AI-powered debugger to analyze a trace where an agent failed to extract pricing data, seeing the exact step where the tool returned invalid results and the surrounding context.
- Monitoring agent performance at scale - A team running 10,000+ agent sessions daily defines a Signal to detect ‘tool_error’ events and uses SQL to aggregate failure rates by model and tool, identifying chronic issues.
- Running evals in CI/CD - A ML engineer integrates Laminar’s eval CLI into their GitHub Actions pipeline to automatically evaluate agent outputs against ground truth and block deployments if quality drops.
- Building labeled datasets for fine-tuning - A researcher uses Laminar’s data annotation UI to label 500 failed agent traces as ‘logic_error’ or ‘tool_failure’ to create a training dataset for a custom agent classifier.
Under The Hood
Architecture
- Monolithic service boundaries are clearly defined through Docker Compose, with isolated containers for app-server, query-engine, and frontend, enforcing clean separation of concerns
- Layered backend design separates data storage (PostgreSQL), analytics (ClickHouse), search (Quickwit), and eventing (RabbitMQ), each with dedicated configuration and health monitoring
- gRPC and REST endpoints coexist in the app-server with distinct ports, enabling clean internal vs external communication boundaries
- Frontend and backend are fully decoupled via environment-driven configuration, supporting independent deployment and development cycles
- Modular Dockerfiles and environment variable injection ensure consistent, scalable deployment across local, lite, and full environments
Tech Stack
- Rust powers the backend services with high-performance gRPC and HTTP APIs, while Next.js 14 drives the frontend with React, NextAuth, and Zod for type-safe authentication and validation
- PostgreSQL serves as the primary relational store with Drizzle ORM for type-safe migrations, while ClickHouse handles analytical workloads with optimized configurations
- Quickwit enables full-text search and observability indexing, integrated via both REST and gRPC interfaces
- Docker Compose orchestrates the entire system with multi-environment support and containerized image sourcing from GitHub Container Registry
Code Quality
- Extensive test coverage spans frontend and query engine layers, with precise assertions for edge cases, URL parsing, and UUID generation
- Strong type safety and input validation are enforced via Zod and schema-aware query rewriting, preventing malformed inputs and SQL injection
- Custom error classes and granular response formatting distinguish between client and server errors, improving debuggability
- Clean, domain-focused code organization separates API routes, utilities, and SQL transformation logic, enhancing maintainability
- Comprehensive linting and access control in the query engine ensure secure, schema-aware query execution
What Makes It Unique
- Native LLM trace debugging with checkpointing allows developers to pause, replay, and inspect generative AI workflows like traditional code breakpoints
- Automatic caching and replay of LLM responses during debugging eliminates manual mocking and accelerates iteration
- Unified API key system ties trace collection, evaluation metrics, and visualization into a single cohesive context
- Interactive chart builder auto-validates axis selections based on data semantics, empowering non-technical users to explore trace data
- Deep integration between debugger checkpoints and trace visualization enables interactive, code-free exploration of LLM execution flows
- End-to-end trace-to-insight pipeline transforms debugging from log parsing to interactive, graph-based exploration of LLM behavior