HyperDX is an open source observability platform designed for engineering teams that need to resolve production issues fast. It unifies session replays, logs, metrics, traces, and errors into a single, intuitive interface, powered by ClickHouse and OpenTelemetry. Unlike traditional tools that require expensive licensing or complex configurations, HyperDX offers affordable, high-performance telemetry analysis with no per-user or per-host fees.
Built on top of ClickHouse for blazing-fast querying and OpenTelemetry for vendor-agnostic instrumentation, HyperDX supports deployment via Docker, ClickHouse Cloud, or self-hosted environments. It integrates with Kubernetes, AWS EC2, Vercel, and more, and provides SDKs for Node.js, Python, Java, Go, and browser environments. The platform eliminates tool fragmentation by correlating frontend session replays with backend traces and logs in real time.
What You Get
- Session Replay Correlation - Automatically links user session replays with backend logs and traces to visualize the full end-to-end flow of an issue from browser to server.
- Blazing Fast Log & Trace Search - Queries terabytes of telemetry data in seconds using ClickHouse’s columnar storage, enabling real-time investigation of production incidents.
- Intuitive Full-Text Search Syntax - Search logs and traces with simple syntax like
level:err or user_id:123 — no SQL required, but SQL is supported when needed.
- Native JSON Parsing - Directly search, filter, and visualize structured JSON logs without pre-defined schemas or additional configuration.
- Live Tail Logs & Traces - Stream and search live log and trace data in real time to monitor ongoing events as they happen.
- OpenTelemetry Native Support - Ingest telemetry data from any OpenTelemetry-compatible language (Node.js, Python, Java, Go, Rust, etc.) without proprietary agents.
- Alerting with Multiple Channels - Set up alerts in clicks and receive notifications via Slack, Email, or PagerDuty when anomalies or errors occur.
- Visual Chart Builder - Create custom dashboards to graph metrics like errors grouped by customer ID, response time trends, or error rates over time with drag-and-drop simplicity.
- Agent-Free Instrumentation - Deploy without sidecars or additional containers; use lightweight SDKs or direct OpenTelemetry collector ingestion.
- ClickHouse-Powered Storage - Leverage ClickHouse’s high-performance, cost-efficient storage to retain logs and traces for 30+ days at 1/10th the cost of Datadog.
- Intercom Integration - Jump directly from an Intercom user chat into their full session replay, logs, and traces for faster customer issue resolution.
- Event Delta Analysis - Identify and analyze trends in anomalies by comparing event volumes over time to detect regressions or spikes.
Common Use Cases
- Debugging a production outage in a microservice architecture - A DevOps engineer uses HyperDX to correlate a user session replay with backend traces and logs across 10+ services to pinpoint a failing API call in under 5 minutes.
- Monitoring a SaaS application with 50k+ daily users - A product team uses HyperDX to track error rates by customer ID, visualize performance degradation during peak hours, and alert on spikes without paying per-user fees.
- Reducing observability costs for a startup - A founder replaces Datadog with HyperDX to cut monthly observability bills from $192 to $20 while retaining full session replay and trace correlation capabilities.
- Instrumenting a multi-language tech stack - A platform engineer instruments Node.js, Python, and Go services with OpenTelemetry to send data to a single HyperDX instance, eliminating the need for multiple monitoring tools.
- Improving customer support response time - A support lead uses Intercom integration to instantly view a user’s session replay and associated logs when a customer reports a bug, reducing resolution time by 70%.
- Analyzing performance regressions after a deployment - A frontend engineer uses HyperDX to compare session replay quality and API latency before and after a release, identifying a slow third-party script causing timeouts.
Under The Hood
Architecture
- Monorepo structure powered by Nx with clear separation of API, frontend, and infrastructure components into isolated projects
- Express-based API layer utilizing dependency injection and interface-driven contracts (IAlert, IWebhook, IConnection) to enable extensibility
- Microservice-style deployment via Docker Compose with decoupled services communicating over standardized protocols like OTLP and OpAMP
- Tight frontend-backend integration through shared TypeScript configuration while maintaining clear layer boundaries between Next.js UI and Express API
- OpAMP server integration serves as a centralized telemetry control plane, enabling dynamic agent configuration via open standards
Tech Stack
- Node.js backend with Express and TypeScript, leveraging Zod for validation and Mongoose for data modeling
- Next.js frontend with React and TypeScript, enhanced by Storybook for component development and NX for scalable project organization
- OpenTelemetry collector paired with ClickHouse as the primary observability backend, supporting both legacy and experimental JSON schemas
- Multi-environment Docker orchestration with isolated database instances for development, testing, and production workflows
- Comprehensive CI/CD pipeline using Makefile for port isolation, Yarn 4 for package management, and Changesets for versioning
- Observability stack built on OTLP, Prometheus, and OpAMP, with extensive OpenAPI and E2E testing via Playwright
Code Quality
- Extensive test coverage across unit, integration, and E2E layers with effective mocking and dependency injection for component isolation
- Strong type safety and consistent naming conventions throughout, reinforced by well-defined interfaces and utility functions
- Robust error handling with structured logging, custom error classes, and comprehensive exception handling in critical data paths
- Modular design with clear package boundaries enabling reuse of utilities like ClickHouse clients and chart renderers
- Integrated linting and testing tooling in CI/CD, with E2E tests using page object models to validate complex user flows and UI state transitions
What Makes It Unique
- Native SQL charting with dynamic macro expansion allows complex analytical queries to be constructed and visualized directly in the UI
- High-performance heatmap visualizations using uPlot with optimized data arrays and quantized color gradients, eliminating server-side pre-aggregation
- Unified query layer that abstracts logs, traces, and metrics into a single SQL-compatible interface with source-aware schema inference and intelligent auto-completion
- AI-assisted alerting and query generation that leverages contextual metadata to suggest meaningful filters, not just generic NL-to-SQL translations
- Distributed table metadata awareness enables seamless querying across sharded ClickHouse clusters without manual configuration
- End-to-end telemetry injection of service version and query context into all traces, enabling precise cross-service root-cause analysis