PeerDB

Postgres-native ETL that streams change data capture in real time to Snowflake, BigQuery, ClickHouse, S3, and Kafka — up to 10x faster than general-purpose pipelines, managed through a familiar Postgres SQL interface.

3.2Kstars
198forks
GNU AGPLv3
Go

PeerDB is an open-source ETL/ELT tool purpose-built for streaming data out of PostgreSQL rather than treating it as one generic connector among hundreds. Instead of a YAML config file or a SaaS console, PeerDB fronts its pipelines with nexus, a Rust-built server that speaks the Postgres wire protocol on port 9900 — so a pipeline (a “mirror”) is created, monitored, and torn down with plain SQL statements like CREATE MIRROR, run from psql, pgAdmin, a BI tool, or a migration framework like Flyway.

Under the hood, a Go service (flow) defines Temporal workflows — CDCFlowWorkflow, QRepFlowWorkflow, SnapshotFlowWorkflow, SetupFlowWorkflow, DropFlowWorkflow, and MaintenanceFlowWorkflow — that orchestrate the actual data movement. Sources are captured via PostgreSQL logical replication (WAL), MySQL binlog (GTID or file-position), or MongoDB change streams; a dedicated snapshot worker parallelizes the initial full-table backfill by partition before CDC takes over for ongoing changes. Destinations include Snowflake, BigQuery, ClickHouse, PostgreSQL, S3, Kafka, Event Hubs, Pub/Sub, and Elasticsearch, with a catalog Postgres database tracking peers, flows, table-schema mappings, and per-batch state, and MinIO staging AVRO files between pull and load.

The project puts real engineering into problems that generic ETL tools often gloss over for Postgres specifically: efficient streaming of large TOASTed columns without re-fetching whole rows, tuned logical-replication configs and parallel WAL-slot reads for 10x faster CDC, and automatic detection and replay of source schema changes (tracked in a schema_delta_audit table). Fault tolerance comes from Temporal’s built-in state persistence, retries, and crash recovery, with Slack and email alerting wired in for mirror failures.

PeerDB is licensed AGPL-3.0 and the self-hosted stack (Docker Compose or Tilt for local dev) ships with the full connector set and no license-gated features; PeerDB Cloud (peerdb.io) is a separately hosted managed offering built on the same open-source core, and the project is also available natively inside ClickHouse Cloud’s ClickPipes.

What You Get

  • A Postgres wire-protocol SQL server (nexus) for creating and managing pipelines with CREATE MIRROR and standard SQL rather than YAML or a console
  • CDC engines for PostgreSQL (logical replication/WAL), MySQL (binlog), and MongoDB (change streams), plus cursor-based and XMIN-based sync modes
  • Nine ready-made destination connectors: Snowflake, BigQuery, ClickHouse, PostgreSQL, S3, Kafka, Event Hubs, Pub/Sub, and Elasticsearch
  • A Next.js web UI for managing peers (connections), mirrors (pipelines), mirror logs, and alert configuration
  • Temporal-backed workflow orchestration giving automatic retries, state persistence, and crash recovery for long-running syncs
  • Built-in Slack and email alerting on mirror errors, plus a maintenance workflow for safe version upgrades

Common Use Cases

  • Replicating operational Postgres tables into Snowflake, BigQuery, or ClickHouse in near real time to keep BI dashboards current
  • Migrating or resharding data between Postgres instances with minimal cutover downtime
  • Streaming Postgres row-level changes into Kafka or Event Hubs to drive downstream event-driven services
  • Landing CDC streams into S3 for lakehouse-style analytics, staged through MinIO before final load
  • Consolidating CDC from mixed Postgres, MySQL, and MongoDB sources into a single ClickHouse or Snowflake destination

Under The Hood

Architecture PeerDB is a modular, multi-service system: the Rust-built nexus server exposes a Postgres wire-protocol SQL front end that routes commands to a Temporal-orchestrated backend, while the Go flow service defines the actual workflows — CDCFlowWorkflow, QRepFlowWorkflow, SnapshotFlowWorkflow, SetupFlowWorkflow, DropFlowWorkflow, and MaintenanceFlowWorkflow — executed by flow workers running discrete activities (pulling records, syncing, normalizing, replaying schema deltas) and a separate snapshot worker for parallelized partition backfills. Every source and destination implements a shared Connector interface family in flow/connectors/core.go (with narrower ValidationConnector and MirrorSourceValidationConnector interfaces layered on top), cleanly separating per-system logic across a dozen connector packages, while catalog state (peers, flows, table-schema mappings, batch tracking, schema-delta audit) lives in a dedicated Postgres database and MinIO stages AVRO files in transit. This gives clear separation of concerns, though a change to the core connector interface would still ripple across every connector package given how many systems are wired into it.

Tech Stack The project spans three languages with a clear division of labor: Go (flow service, using the Temporal Go SDK, pgx for Postgres access, and native SDKs for BigQuery, ClickHouse, Azure Event Hubs, and GCP Pub/Sub) for orchestration and connectors; Rust, organized as a multi-crate workspace, for the Postgres-wire SQL server and per-destination query planning/routing crates; and TypeScript/Next.js with Radix UI and Tailwind for the web console. Orchestration runs on a self-hosted Temporal Server, protobuf/buf generates schemas shared across the Go and Rust codebases, and the whole stack is composed via Docker Compose for local development and Tilt for iterative testing against real source/destination containers.

Code Quality The repository carries an extensive automated test suite — well over a hundred Go test files covering CDC, snapshot partitioning, and schema-delta logic, complemented by Rust unit tests and a substantial end-to-end suite that spins up real Postgres, MySQL, MariaDB, MongoDB, and ClickHouse instances via Tilt. Error handling favors explicit, typed exceptions with a dedicated error-classification system for alerting rather than swallowed errors, and CI enforces linting and static analysis separately for the Go and Rust codebases alongside dedicated security scanning. The main gap is contributor-facing documentation: there is no CONTRIBUTING guide, and the docs/ directory holds only internal architecture write-ups rather than onboarding material for new contributors.

What Makes It Unique PeerDB’s defining choice is fronting an ETL pipeline with a genuine Postgres SQL interface — pipelines are created and inspected with ordinary CREATE MIRROR statements from any Postgres-speaking client, rather than through a proprietary console or YAML file, which lets teams reuse their existing Postgres tooling (psql, BI tools, migration frameworks) to manage ETL. Layered on top of that is a set of Postgres-specific performance optimizations — parallel logical-replication slot reads, tuned WAL configuration, and dedicated handling of large TOASTed columns — that general-purpose CDC tools treating Postgres as just another connector typically don’t invest in. The overall CDC-plus-workflow-orchestration pattern is well established elsewhere, but the SQL-native control plane combined with Postgres-first tuning is a distinctive product bet.

Self-Hosting

Licensing Model AGPL-3.0 licensed — all core functionality (CDC engines, connectors, the SQL interface, and the web UI) is available in self-hosted deployments with no license keys required. AGPL’s network-copyleft terms mean modifications run as a network service must also be made available under the same license.

Self-Hosting Restrictions None found — there are no ee/, enterprise/, pro/, or cloud/ directories, and no license-check or feature-flag gating code in the repository.

Enterprise Features Not applicable to the open-source project itself; PeerDB Cloud (peerdb.io) is a separate managed offering built on the same open-source core, and PeerDB is also available natively inside ClickHouse Cloud’s ClickPipes.

Cloud vs Self-Hosted PeerDB Cloud removes the operational burden of running Temporal, the catalog Postgres, and MinIO yourself, but does not appear to add capabilities beyond what the self-hosted mirrors/peers model already provides.

License Key Required No.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search