SigNoz is an open-source observability platform that consolidates application performance monitoring (APM), distributed tracing, log management, metrics, alerts, and LLM observability into a single, unified interface. It’s designed for DevOps teams, SREs, and full-stack developers who need end-to-end visibility into their microservices and cloud-native applications without vendor lock-in. Built on OpenTelemetry standards and powered by ClickHouse, SigNoz enables real-time correlation of logs, metrics, and traces to accelerate root-cause analysis.
Technically, SigNoz ingests telemetry data via OpenTelemetry SDKs across 10+ languages, stores it in the high-performance columnar database ClickHouse, and provides a React/Next.js frontend with a DIY query builder, PromQL, and ClickHouse SQL support. It supports both cloud-hosted and self-hosted deployments via Docker or Helm, making it ideal for teams seeking full data control or hybrid cloud strategies.
What You Get
- Unified APM & Traces - Monitor application performance with out-of-the-box charts for p99 latency, error rate, Apdex, and operations per second, with deep trace visualization using Flamegraphs and Gantt charts.
- ClickHouse-Powered Log Management - Ingest, search, and analyze high-volume logs with fast aggregations and a powerful query builder, leveraging ClickHouse’s columnar storage used by Uber and Cloudflare.
- Distributed Tracing with Correlation - Track user requests across microservices and correlate traces with logs and metrics to identify bottlenecks, with span-based events and semantic convention support.
- Metrics & Custom Dashboards - Ingest infrastructure and application metrics, create customizable dashboards with pie charts, time-series, and bar charts, and combine queries using formulae for complex analytics.
- LLM/AI Observability - Monitor LLM calls, track token usage, analyze response latency, and measure cost per request to optimize AI applications in production.
- Advanced Alerts with Anomaly Detection - Set alerts on logs, metrics, or traces with threshold-based and anomaly-detection rules, and receive notifications via Slack, email, or webhooks.
- Exceptions Monitoring - Automatically capture and visualize stack traces for exceptions in Python, Java, Ruby, and JavaScript, with custom attributes to identify affected users or contexts.
- OpenTelemetry-Native Ingestion - Native support for OpenTelemetry semantic conventions, span-based events, and OTel-compatible data pipelines via Opamp for consistent telemetry across services.
Common Use Cases
- Monitoring microservices in production - A DevOps team uses SigNoz to correlate trace delays with high CPU metrics and error logs in their Kubernetes cluster, reducing MTTR by 60%.
- Debugging LLM application failures - An AI engineering team tracks token usage and latency spikes in their chatbot’s LLM calls to optimize prompt engineering and reduce API costs.
- Replacing Datadog with open-source tools - A startup migrates from Datadog to SigNoz to cut licensing costs while maintaining full trace-to-log correlation and custom dashboard capabilities.
- Centralizing observability across polyglot services - A fintech company instruments Java, Python, and Go services with OpenTelemetry to unify monitoring across their tech stack without vendor-specific agents.
Under The Hood
Architecture
- Clear separation of concerns through modular Go components, with distinct packages for application entry points, reusable libraries, and enterprise-specific features, enforced via package boundaries.
- Dependency injection implemented via pluggable configuration providers, enabling flexible sourcing without tight coupling.
- Structured logging with slog and strict linting rules ensure consistent, production-grade diagnostics across the codebase.
- Layered configuration resolution and testable config structs provide strong encapsulation and predictable behavior.
- OpenAPI documentation is auto-generated from code, integrated into the build pipeline to eliminate documentation drift.
- Clean CLI architecture using cobra decouples command logic from initialization, supporting extensible subcommands.
Tech Stack
- Go 1.20+ backend with structured logging via slog, custom error handling, and OpenTelemetry Collector as the core telemetry ingestion engine.
- ClickHouse serves as the primary time-series database for traces and metrics, with dedicated local development environments via Docker Compose.
- React + TypeScript frontend with Vite, React Query, and Axios for a responsive, type-safe user interface, hosted in a separate module.
- Comprehensive Go testing with mockery for mocking interfaces and strict linters enforcing logging and error standards.
- Docker-based multi-arch builds with build tags differentiating enterprise and community editions, supported by GitPod for instant development environments.
Code Quality
- Extensive test coverage across backend and frontend with unit, integration, and end-to-end tests using industry-standard frameworks and custom fixtures.
- Strong type safety in the frontend through TypeScript interfaces and strict prop typing, with clear separation of components, hooks, and utilities.
- Consistent, descriptive naming conventions and intent-driven test names enhance readability and maintainability.
- Robust error handling with explicit returns and defensive programming in both Go and TypeScript, though custom error types are limited.
- Well-structured frontend code with mocked dependencies and utility functions that handle edge cases like JSON parsing and object flattening.
- Automated linting and testing pipelines ensure code quality, predictability, and maintainability across distributed services and UI layers.
What Makes It Unique
- Native configuration resolver with layered file and environment precedence enables seamless multi-environment deployments without external scripting.
- Unified OpenAPI schema generation directly from code ensures API documentation remains synchronized with implementation by design.
- Deep SSO integration with granular role-based access control using attribute mapping and transitive group membership for enterprise-grade security.
- Frontend dynamically generates UI elements from API schemas, reducing boilerplate and ensuring consistency across the user interface.
- Lottie animations and interactive onboarding flows guide users through complex observability integrations, enhancing user experience.
- End-to-end observability platform built from the ground up, unifying traces, metrics, and logs in a single cohesive system.