Elementary

The dbt-native data observability CLI that turns your existing dbt tests and metadata into anomaly detection, lineage graphs, and Slack/Teams alerts — no separate platform required.

2.4Kstars
221forks
Apache License 2.0
HTML

Elementary is the open-source CLI half of a dbt-native data observability stack: it reads the metadata, test results, and run artifacts that the companion Elementary dbt package writes into your warehouse, then turns that raw telemetry into anomaly alerts, a data quality dashboard, and lineage graphs — without shipping your data to a third-party service. Built by Elementary, a company backed by Y Combinator (W23), the project has grown into one of the most widely adopted tools in the dbt ecosystem, with over 2,300 GitHub stars and dozens of releases shipped every year.

Under the hood, the edr CLI (a Click-based entry point defined in elementary/cli/cli.py) wraps a dbt runner abstraction that can invoke dbt via subprocess, the dbt Fusion engine, or a programmatic API, letting it slot into almost any existing dbt project without changing how teams already run dbt. Anomaly detection, freshness checks, and volume/schema monitors are implemented as native dbt tests and macros in the sibling dbt-data-reliability package, so quality rules live in version-controlled dbt code instead of a separate rules engine; the CLI then fetches those results, builds a static HTML report, and routes failures to Slack or Microsoft Teams through dedicated alert integrations.

Elementary intentionally keeps the CLI and dbt package fully open under Apache-2.0 — there is no license check, feature flag, or paywall anywhere in this repository — while offering a separate hosted product, Elementary Cloud, for teams that want managed ML-based anomaly detection, column-level lineage from ingestion to BI, and AI agents layered on the same open data model. That makes it a rare data-observability tool where the self-hosted path is a genuine first-class product rather than a crippled trial of the commercial one.

What You Get

  • The edr CLI (installable via pip install elementary-data) with monitor, report, send-report, and run-operation commands
  • Native dbt anomaly-detection tests and freshness/volume/schema monitors defined as macros in the companion dbt-data-reliability package
  • A self-contained static HTML data quality report and dashboard generated straight from your warehouse’s elementary schema
  • Built-in Slack and Microsoft Teams alert integrations with owner tagging and configurable routing
  • End-to-end lineage graphs enriched with the latest test-result status for root-cause and impact analysis
  • Support for a dozen-plus warehouse/query-engine targets out of the box (Snowflake, BigQuery, Redshift, Databricks, Postgres, Spark, Trino, ClickHouse, DuckDB, and more) via optional pip extras

Common Use Cases

  • Catching silent data quality regressions (row-count drops, schema drift, stale tables) before they reach dashboards or ML models
  • Adding anomaly-detection tests directly into an existing dbt project’s CI/CD without adopting a separate SaaS platform
  • Alerting on-call data engineers in Slack or Teams the moment a dbt test fails or a source stops refreshing
  • Generating a shareable data quality report for stakeholders after each dbt run
  • Tracing the blast radius of a broken upstream model through lineage before it reaches downstream BI tools

Under The Hood

Architecture The edr CLI, built with Click, dispatches into command groups for monitoring, reporting, and running dbt operations, all wired through a central DataMonitoring orchestrator that first spins up an internal dbt runner, fetches the latest dbt invocation metadata, and checks compatibility against the companion dbt package version before doing any work. From there it delegates to a layered pipeline: dedicated fetcher classes pull test results, model runs, source freshness, and lineage data that a sibling dbt package has already written into the warehouse’s elementary schema via its own macros, and that data flows onward into alert builders with pluggable Slack and Teams integrations and into a static HTML report renderer. The result is a clean separation between data collection (owned by the dbt package) and observability logic (owned by this CLI), connected entirely through warehouse tables rather than a live service.

Tech Stack This is a modern Python package targeting current interpreter versions, built and published with Poetry, and designed to sit on top of the dbt-core ecosystem rather than replace any part of it — it ships as a thin, warehouse-agnostic layer with optional extras for each supported adapter so users only install the warehouse driver they need. Rather than talking to warehouses directly, it drives dbt itself through an abstraction that can execute commands via subprocess, an in-process API, or the newer dbt Fusion engine, which future-proofs it against changes in how dbt is invoked. Alerting is handled through purpose-built SDKs for Slack and Microsoft Teams, artifact persistence is supported against major cloud object stores, and anonymized usage telemetry is collected to inform the roadmap; the whole project is packaged for both PyPI distribution and containerized deployment, with continuous integration exercising it against every supported warehouse.

Code Quality The project maintains an extensive, multi-layered test suite spanning unit, integration, and full end-to-end dbt-project tests, with continuous integration explicitly re-running the suite against each supported warehouse rather than relying on a single reference database. Error handling is deliberate rather than an afterthought: a custom exception hierarchy distinguishes configuration errors from dbt command failures, and exceptions carry structured context for telemetry instead of being silently swallowed. Type hints are used consistently throughout the codebase and enforced through static type checking and linting in continuous integration, though inline comments and docstrings are comparatively sparse, with the code instead leaning on descriptive naming and type annotations to stay self-explanatory.

What Makes It Unique Elementary’s core idea is putting data quality checks where the data pipeline already lives — as native dbt tests and macros — rather than as rules configured in a separate observability UI, which means quality logic gets code review, version control, and CI treatment for free. It pairs that with lineage that is enriched by live test-result status, so impact analysis and root-cause tracing use the same signal as alerting, and it draws an unusually clean line between a genuinely complete open-source core and a commercial cloud tier that adds managed machine-learning monitoring and enterprise-scale lineage on top, rather than holding back core functionality. The underlying anomaly-detection techniques themselves are fairly standard statistical approaches rather than groundbreaking research, so the innovation is more about packaging and openness than about novel algorithms.

Self-Hosting

Licensing Model The CLI and its companion dbt package are both licensed under Apache-2.0, a permissive open-source license with no field-of-use restrictions. There is no dual-licensing, no source-available clause, and no license file carve-out limiting commercial use.

Self-Hosting Restrictions None found. A repository-wide search turned up no license-check code, feature flags, or gated modules (no ee/, enterprise/, or pro/ directories, and no license_key, isPro, or isEnterprise logic anywhere in the codebase). Every command available in the CLI, including anomaly detection, lineage, reporting, and alerting, runs fully self-hosted with no license key or phone-home requirement to function.

Enterprise Features The self-hosted OSS package does not carve out a separate “enterprise” tier of its own; all CLI functionality is available to every self-hosting user regardless of scale.

Cloud vs Self-Hosted Elementary also sells a separate hosted product, Elementary Cloud, positioned as a “Data & AI Control Plane” that adds managed ML-based anomaly detection, column-level lineage from ingestion through BI tools, a built-in data catalog, and AI agents for scaling reliability workflows. This is architected as an upsell layered on top of the same open data model rather than a requirement to use the OSS CLI — teams can run the open-source package indefinitely without ever adopting the cloud offering.

License Key Required No. The edr CLI and dbt package operate with no license key, activation step, or usage cap of any kind.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search