DataLens is an open-source business intelligence and data visualization system originally built and used internally at Yandex. It provides a full-stack solution for creating interactive dashboards, reports, and data visualizations from SQL-based data sources like PostgreSQL, ClickHouse, and YTsaurus. Designed for both developers and business analysts, DataLens combines a user-friendly UI with a powerful backend data processing engine and centralized metadata management. Unlike traditional BI tools that require proprietary connectors or expensive licenses, DataLens offers a self-hosted alternative with transparent code and extensible architecture. It supports production-grade deployments via Docker Compose or Kubernetes Helm charts, making it suitable for teams seeking control over their analytics infrastructure without vendor lock-in.
What You Get
- Drag-and-drop dashboard builder - Create interactive visualizations using a no-code UI with support for charts, tables, and maps; data is pulled from SQL sources without requiring manual query writing.
- Multi-source SQL connectivity - Connect directly to PostgreSQL, ClickHouse, and ClickHouse over YTsaurus for real-time analytics; the backend generates optimized SQL queries from visual transformations.
- Role-based access control (RBAC) - Granular permissions with three roles: viewer (read-only), editor (create/edit/delete objects), and admin (system-wide management); roles are assignable via the web UI.
- Production-ready deployment options - Deploy using Docker Compose with auto-generated secrets or Helm charts on Kubernetes, with support for external PostgreSQL databases and HTTPS via custom domains.
- Yandex Maps integration - Native support for Yandex Maps visualizations using API keys, enabling geospatial data display without third-party dependencies.
- Metadata persistence via PostgreSQL - All user-created dashboards, connections, and configurations are stored in a dedicated PostgreSQL database volume that persists across updates.
Common Use Cases
- Building a multi-tenant SaaS analytics dashboard - Organizations use DataLens to offer embedded reporting features to customers, leveraging its RBAC system to enforce data access boundaries across tenants.
- Creating a mobile-first e-commerce platform with 10k+ SKUs - Retail teams use DataLens to visualize sales trends, inventory turnover, and regional performance from ClickHouse data stores with real-time refreshes.
- Problem: Legacy BI tools are expensive and locked to proprietary formats → Solution: DataLens - Teams migrating from Tableau or Power BI replace them with a self-hosted, open-source alternative that supports SQL-native data sources and integrates into existing infrastructure without licensing fees.
- DevOps teams managing microservices across multiple cloud providers - Engineering teams deploy DataLens via Helm charts on Kubernetes to centralize monitoring and operational metrics dashboards across AWS, GCP, and on-prem clusters.
Under The Hood
The project is a sophisticated data platform infrastructure built around database-centric development with strong integration into modern cloud-native ecosystems. It emphasizes modular design, containerized deployment, and automation across development, testing, and release workflows.
Architecture
This system adopts a multi-layered architecture tailored for containerized and cloud-native environments, supporting scalable and maintainable deployments.
- Service-oriented structure with well-defined dependencies and reusable configuration templates
- Clear separation between backend services, UI components, and infrastructure configurations
- Modular organization that supports extensibility and consistent deployment patterns
Tech Stack
The system is built primarily with PLpgSQL, leveraging PostgreSQL as its core technology while integrating modern DevOps practices.
- Built predominantly in PLpgSQL, with shell scripting and HCL for infrastructure automation
- Heavily utilizes Docker, Kubernetes, Helm, and Terraform for orchestration and deployment
- Integrates CI/CD pipelines with end-to-end testing and changelog automation through Python scripts
- Emphasis on infrastructure-as-code practices with Terraform modules and Helm charts
Code Quality
While the codebase shows some structured practices, it exhibits inconsistencies and gaps in testing and standardization.
- Error handling is present but not uniformly applied across all components
- Organized component structure exists, though lacks comprehensive test coverage
- Type annotations and linting are present, indicating some attention to code consistency
- Limited test files suggest a gap in validation and regression testing practices
What Makes It Unique
This project introduces a unique blend of database-first development with Kubernetes-native infrastructure and automation.
- Combines Helm and Terraform for managing multi-service data platforms with consistent deployment workflows
- Provides extensive tooling for local development using Docker Compose and custom entrypoints to simplify contributor setups
- Automates changelog generation and release processes via Python scripts within GitHub Actions
- Integrates PostgreSQL, Temporal, and custom UI services under a Kubernetes orchestration framework with detailed monitoring