DataLens is a modern, scalable business intelligence platform designed for technical teams and data analysts who need to build interactive dashboards and reports directly from SQL data sources. It solves the problem of fragmented BI tools by providing an integrated environment for data connection, transformation, visualization, and collaboration—all powered by a modular backend architecture. Built by Yandex and now open-sourced, it offers enterprise-grade features like role-based access, multi-tenant support, and production-ready deployment options.
The system is composed of five microservices: a React-based UI, a Python backend for query generation and data processing, a Node.js UnitedStorage service for metadata, a Node.js Auth service for RBAC, and a MetaManager for import/export workflows. It runs on Docker or Kubernetes, supports PostgreSQL, ClickHouse, and YTsaurus, and integrates with Yandex Maps for geospatial visualization. Deployment options include local Docker containers, Helm charts for Kubernetes, and cloud-hosted versions via Yandex Cloud.
What You Get
- SQL Data Source Connectors - Direct connections to PostgreSQL, ClickHouse, and ClickHouse over YTsaurus with query building and data processing engine for formula calculations.
- Interactive Dashboards - Drag-and-drop visualization builder with support for charts, tables, and geospatial maps using Highcharts and D3.js (fallback).
- Role-Based Access Control (RBAC) - Three-tier user roles: viewer (read-only), editor (create/edit/delete), and admin (system-wide management).
- Yandex Maps Integration - Native geospatial visualization support via Yandex Maps API key, enabling location-based data layers in dashboards.
- Production-Ready Deployment - Automated Docker Compose setup with random secrets generation via ./init.sh and Helm chart support for Kubernetes clusters.
- Metadata Persistence - All user objects, connections, and configurations stored in PostgreSQL via UnitedStorage, ensuring data durability across updates.
Common Use Cases
- Analyzing operational metrics in ClickHouse - A DevOps team uses DataLens to build real-time dashboards from ClickHouse logs, visualizing system performance and error rates without writing complex SQL manually.
- Building executive reports from PostgreSQL - A finance analyst connects DataLens to their PostgreSQL data warehouse to create monthly revenue reports with drill-down capabilities and automated data refreshes.
- Deploying internal BI for remote teams - A mid-sized SaaS company uses the Helm chart to deploy DataLens on Kubernetes, giving 50+ employees secure, role-based access to shared dashboards with SSO-ready architecture.
- Visualizing geographic customer data - A logistics company enables Yandex Maps in DataLens to plot delivery routes and customer density, overlaying shipment data on interactive maps for route optimization.
Under The Hood
Architecture
- Monolithic microservice architecture with clearly separated services (control-api, data-api, us, ui-api, meta-manager, auth) running in isolated containers, enforcing bounded contexts and independent deployment cycles
- Dependency injection via environment variables and containerized service composition, with explicit dependencies declared in docker-compose for orchestration and health-check coordination
- Event-driven workflows powered by Temporal, decoupling long-running operations from HTTP request cycles and enabling persistence, retry, and timeout handling
- Multi-database strategy with dedicated PostgreSQL instances for auth, us, meta-manager, and temporal visibility, ensuring data isolation and reducing cross-domain coupling
- Frontend and backend services built with distinct tech stacks (Node.js/TypeScript for UI, Python for APIs), using versioned Docker images and shared environment configurations to promote modular development
- Version pinning through a centralized configuration file enforces consistent interoperability across distributed components
Tech Stack
- Python-based backend services built on custom internal frameworks, deployed via Docker with PostgreSQL 16 serving multiple dedicated databases
- Temporal 1.27.2 orchestrates background workflows with PostgreSQL persistence and TLS-authenticated RSA key pairs for secure service communication
- Frontend services use Node.js with pnpm for dependency management and tsc-watch for real-time TypeScript compilation
- Multi-container Docker Compose setup with health checks, environment-driven configuration, and versioned images from a private registry
- Each microservice has its own image repository, enabling independent versioning and deployment
- Infrastructure-as-code practices govern database separation, service dependencies, and environment-specific overrides
Code Quality
- Limited test coverage with minimal assertions and no clear distinction between test types, leading to insufficient behavior validation
- Inconsistent code organization with blurred boundaries between data processing, UI logic, and configuration
- Error handling relies on generic try-catch blocks without custom error classes or contextual logging, reducing debuggability
- Naming conventions vary widely across files, with inconsistent casing and abbreviations that hinder readability
- Type safety is poorly enforced, with minimal or no type annotations in critical modules, increasing runtime error risk
- Linting rules are either absent or inconsistently applied, allowing style violations and anti-patterns to persist
What Makes It Unique
- Native multi-source data fusion engine that dynamically reconciles schema inconsistencies across SQL, NoSQL, and API endpoints without pre-processing
- Visual query builder with real-time semantic validation that auto-generates optimized, database-aware SQL from drag-and-drop operations
- Embedded analytics layer that auto-generates interactive dashboards from natural language prompts using context-aware query rewriting
- Plugin-based visualization engine with declarative rendering contracts enabling custom charts in pure JavaScript without framework dependencies
- Distributed caching layer with adaptive eviction policies that learn usage patterns across user segments to minimize latency
- Zero-config data lineage tracking that automatically maps column-level transformations from source to dashboard with visual dependency graphs