Uptrace is an open-source Application Performance Monitoring (APM) platform designed for developers and SREs managing distributed systems. It solves the fragmentation problem in observability by unifying traces, metrics, and logs into a single interface with no vendor lock-in. Built on OpenTelemetry for data collection and ClickHouse for high-performance storage, it enables real-time querying of billions of spans and metrics on a single server.
Uptrace supports ingestion from OTLP, Prometheus, Vector, FluentBit, and CloudWatch, and integrates with Grafana as a Tempo/Prometheus datasource. It runs as a self-hosted service using Docker, Kubernetes, or Ansible, with PostgreSQL for metadata and ClickHouse for time-series data. The UI is built with Vue.js and offers SQL-like trace queries and PromQL-compatible metric queries.
What You Get
- Unified UI for traces, metrics, and logs - Single interface to view and correlate distributed traces, time-series metrics, and structured logs without switching tools.
- SQL-like query language for traces - Aggregate and filter spans using SQL-like syntax to find slow requests, errors, or patterns across services.
- PromQL-compatible metrics queries - Use familiar PromQL syntax to query and visualize metrics collected via OpenTelemetry or Prometheus.
- 50+ pre-built dashboards - Automatically generated dashboards for common services and infrastructure metrics as soon as data arrives.
- Span and log correlation - Automatically link logs to traces using context propagation, enabling root cause analysis with stack traces and error details.
- ClickHouse-powered compression - Compress 1KB spans down to ~40 bytes, reducing storage costs by 95% compared to traditional APM tools.
- Multi-channel alerting - Configure alerts based on trace errors, metric thresholds, or log patterns with notifications via Email, Slack, Webhook, and AlertManager.
- Service dependency graph - Visualize inter-service relationships and identify bottlenecks or failure propagation paths in microservices architectures.
- SSO via OpenID Connect - Secure access with Keycloak, Google Cloud, or Cloudflare authentication for enterprise teams.
- Grafana integration - Use Uptrace as a data source in Grafana for Tempo traces and Prometheus metrics, extending existing observability stacks.
Common Use Cases
- Debugging microservices failures - A DevOps engineer uses Uptrace to trace a slow API call across 12 services, correlating logs and spans to find a misconfigured database query.
- Monitoring Kubernetes clusters at scale - A platform team deploys Uptrace to monitor 500+ pods, tracking latency spikes and resource usage with pre-built dashboards and alerts.
- Migrating from Datadog or New Relic - A company replaces expensive commercial APM tools with Uptrace’s self-hosted version, saving 95% on storage costs while retaining full trace and metric visibility.
- Tracking business KPIs alongside system metrics - A SaaS company monitors user signups (business metric) alongside API latency and error rates to detect performance degradation affecting conversions.
Under The Hood
Architecture
- Monolithic Go service organized into clear package layers (api, db, alert) with dependency injection via constructors and interface-based contracts
- Event-driven communication through a centralized EventBus to decouple tracing, metrics, and alerting pipelines
- External systems like ClickHouse and PostgreSQL are treated as configurable dependencies with health checks, not embedded components
- Frontend and backend are fully decoupled, communicating solely via REST/gRPC APIs with isolated build and deployment pipelines
- Centralized YAML-based configuration enables environment-aware deployment without code modifications
Tech Stack
- Go 1.22+ backend with static binaries and no CGO, using custom HTTP and gRPC handlers
- ClickHouse as the primary time-series store with distributed sharding and replication
- OpenTelemetry Collector integrated for trace ingestion via custom YAML configurations
- Vue 3 frontend with Vuetify and Vite bundling, deployed as static assets
- PostgreSQL for relational metadata with full migration support
- Integrated observability components including Prometheus, Grafana, Alertmanager, and Vector for a unified monitoring pipeline
Code Quality
- Limited test coverage with superficial assertions and minimal validation logic
- Error handling is generic and lacks structured context or custom error types
- Inconsistent architectural patterns across modules, suggesting fragmented development practices
- Naming conventions are non-uniform, with mixed casing and ambiguous identifiers
- Absence of static typing or type safety mechanisms increases risk of runtime errors
- Linting and code quality tooling are either missing or unenforced, leading to stylistic and structural inconsistencies
What Makes It Unique
- UQL (Uptrace Query Language) enables non-technical users to build complex time-series queries through intuitive drag-and-drop visual components
- Real-time attribute filtering with live count badges allows dynamic exploration of high-cardinality telemetry data without pre-aggregation
- Unified UI that tightly integrates query building, visualization, and alert configuration in a single declarative workflow
- Custom query chip component with inline expression editing and drag-to-reorder functionality creates a novel visual programming paradigm
- Native integration of OpenTelemetry attributes as first-class query dimensions, enabling deep trace metadata exploration directly in the UI
- Event-driven state synchronization across metrics, alerts, and traces without page reloads or external state management libraries