openduck

Name: openduck
Rating: 5 (557 reviews)

OpenDuck brings MotherDuck-style cloud capabilities to self-hosted DuckDB — attach remote databases, run hybrid queries across local and remote nodes, and own your data with an open gRPC and Arrow IPC protocol.

557stars

26forks

MIT License

C++

View Source

On This Page

OpenDuck is an open-source implementation of the distributed DuckDB architecture pioneered by MotherDuck. It lets you attach a remote database in a single line — ATTACH 'openduck:mydb' — and query remote tables as if they were local, with no changes to your SQL and no separate client driver. The extension integrates directly into DuckDB’s StorageExtension and Catalog interfaces, so remote tables are first-class catalog entries that the optimizer and query planner see exactly like local tables.

The project reimplements two key ideas from MotherDuck as open technology: differential storage and dual (hybrid) execution. Differential storage uses append-only sealed layers tracked in Postgres metadata and backed by local filesystem or S3-compatible object storage, giving you snapshot isolation and concurrent readers without sacrificing DuckDB’s embedded-DB feel. Hybrid execution lets a single query plan split across your local machine and a remote worker — the gateway labels each operator LOCAL or REMOTE, inserts bridge operators at boundaries, and streams only intermediate results over the wire.

The protocol is intentionally minimal and fully open: eight gRPC RPCs defined in a single .proto file cover the data plane, transaction management, and worker lifecycle. Any service that speaks this protocol and streams Arrow IPC batches can serve as an OpenDuck backend. You can self-host the included Rust gateway, swap in your own backend, or plug in an entirely different execution engine — the DuckDB extension doesn’t care what’s on the other side.

OpenDuck also integrates with DuckLake, the Parquet-based lakehouse catalog. They operate at different layers: DuckLake handles where data lives and how tables are organized; OpenDuck handles the transport and storage I/O. A DuckLake-backed worker on a remote server becomes transparently accessible via ATTACH 'openduck:...', combining DuckLake’s table format with OpenDuck’s hybrid execution.

What You Get

A DuckDB C++ extension that registers openduck: and od: attach schemes, presenting remote tables as native catalog entries that participate in joins, CTEs, and the optimizer
A Rust gateway that authenticates clients, routes queries by database affinity and compute context, splits hybrid plans with LOCAL/REMOTE placement labels, and manages backpressure
Embedded DuckDB workers that execute query fragments and stream results as Arrow IPC batches over gRPC, with per-transaction connection registries and idle-connection reaping
Differential storage with append-only sealed layers tracked in Postgres, backed by local filesystem or S3-compatible object storage, with snapshot UUIDs for consistent reads
A Python client package (pip install -e clients/python) and a Rust client API for attaching and querying without manually loading extension binaries
A unified CLI (openduck) covering gateway, worker, query, cancel, status, snapshot, and GC operations
OpenTelemetry OTLP metrics export and a benchmark harness with configurable regression thresholds
DuckLake interoperability — OpenDuck handles the transport while DuckLake handles the Parquet table format on the same remote worker

Common Use Cases

Data analysts who want to JOIN local DataFrames against large remote tables without copying the remote data to their machine
Platform engineers building multi-tenant DuckDB services who need snapshot isolation, concurrent readers, and auditable access logs
Teams that want the ergonomics of MotherDuck’s ATTACH and hybrid queries but prefer self-hosting on their own infrastructure with no vendor lock-in
Projects that need an open, replaceable backend protocol — organizations can implement ExecutionService against their own storage engine and serve DuckDB clients transparently
Workloads that combine DuckLake’s Parquet lakehouse catalog with DuckDB-native storage: OpenDuck transports queries to a DuckLake-backed worker and streams results back
Embedded analytics applications where the worker runs close to the data (object storage, Postgres) and the client runs on a lightweight edge machine

Under The Hood

Architecture OpenDuck follows a strictly layered, modular architecture that separates concerns across independently deployable components: a DuckDB C++ extension handling catalog integration and the client-side attach scheme, a Rust gateway responsible for authentication, worker registry, affinity-based routing, hybrid plan splitting, and backpressure, and Rust workers that embed DuckDB and stream Arrow IPC results. Storage is further decomposed into discrete crates — core type traits, Postgres-backed metadata, append-only on-disk segment files, S3 sealed-layer upload, a FUSE adapter, and a C ABI bridge for the extension — each with a single responsibility and a clean trait boundary (StorageBackend). The gateway’s hybrid planner is a genuine separation-of-concerns achievement: placement decisions, bridge insertion, and plan annotation are all modeled as distinct phases operating on an explicit PlanNode tree, not as ad-hoc SQL rewriting. The overall design is event-driven where concurrency matters (Tokio async throughout the Rust components) and synchronous where DuckDB’s own threading model requires it.

Tech Stack The Rust gateway and workers are built on Tokio for async I/O, Tonic for gRPC server and client, and Prost for protobuf serialization; Arrow IPC batches are streamed using the arrow crate’s StreamWriter. Persistent metadata lives in PostgreSQL accessed via SQLx with compile-time-checked queries; sealed layers are uploaded to any S3-compatible object store via the object_store crate. The DuckDB extension is written in C++ using DuckDB’s extension SDK, with vcpkg managing native dependencies and a CMake build. The Rust components share a workspace with resolver 2 and a unified [workspace.dependencies] table. Observability is handled by OpenTelemetry with an optional OTLP exporter via opentelemetry-otlp. The Python client wraps the extension via duckdb’s Python bindings and auto-discovers the local build tree or OPENDUCK_EXTENSION_PATH. Docker Compose configurations cover a minimal two-service stack and a full stack with Postgres and object storage.

Code Quality The test suite is extensive and deliberately structured around integration scenarios rather than unit tests alone: end-to-end tests cover gateway-worker round trips, gRPC smoke paths, hybrid join baseline parity, ingest pipelines, idle reaper behavior, worker death and reconnect, typed error classification, and transaction lifecycle. The diff-bridge crate has a separate bridge smoke test; the diff-metadata crate includes a Postgres backend integration test. Error handling is explicit and typed throughout — the errors module in the worker classifies every DuckDB error variant into a structured protobuf kind, and the gateway propagates typed errors on both the legacy string field and the new typed_error field. Inline documentation is comprehensive: every public module has a module-level doc comment explaining its role, and public types and trait methods are documented. There is no linter configuration visible in the repo root, but the Rust compiler’s own type system and the structured error classification enforce a high degree of correctness by construction.

What Makes It Unique OpenDuck’s primary innovation is taking MotherDuck’s proprietary architectural ideas — differential storage and dual execution — and reimplementing them as a fully open, replaceable stack. The hybrid execution planner is genuinely novel for open-source DuckDB tooling: it models the query plan as an explicit tree with placement annotations, inserts bridge operators at LOCAL/REMOTE boundaries, and verifies parity against a single-process baseline, rather than simply routing full queries to one engine or the other. The StorageBackend trait abstraction means the same DuckDB extension can work transparently across in-memory, in-process, FUSE-mounted, and S3-tiered storage modes. The deliberate minimalism of the gRPC protocol — eight RPCs, open .proto, Arrow IPC on the wire — makes the backend genuinely replaceable: any conforming service becomes a valid DuckDB cloud backend without changes to the extension or client code.

Self-Hosting

OpenDuck is released under the MIT License, which is one of the most permissive open-source licenses available. You can use it commercially, modify it, redistribute it, and embed it in proprietary products without any obligation to open-source your own code. There are no copyleft implications: the MIT License does not require you to publish changes, and there are no non-commercial-use restrictions. Attribution in the license and copyright notice is the only requirement.

Running OpenDuck yourself means owning the full operational stack. You are responsible for provisioning and maintaining the Rust gateway and one or more DuckDB workers, keeping a Postgres instance available for differential storage metadata, and optionally configuring an S3-compatible object store for sealed layer uploads. The gateway handles auth, routing, and backpressure, but you set the token secrets, manage certificates for gRPC, and handle scaling workers up or down as query load changes. Snapshot GC and layer tiering are available via the CLI but must be scheduled by you. There is no managed upgrade path — you pull new releases and redeploy.

Because OpenDuck has no hosted or SaaS tier, everything you give up compared to a managed service like MotherDuck is also everything you decide for yourself. MotherDuck provides managed upgrades, high availability, cloud backups, SLAs, and a support team; OpenDuck provides none of those out of the box. The trade-off is operational ownership and data sovereignty: your credentials never leave your network, you control the Postgres metadata store, and you can inspect or replace any layer of the stack. For teams with existing infrastructure and a preference for self-hosting, this is the intended operating model.

Related Apps

TypeScript

71%

Apache 2.0

Supabase

Developer Tools · Databases · Search

105,714

The open-source Postgres development platform that replaces Firebase with authentication, real-time APIs, edge functions, storage, and vector embeddings — all built on PostgreSQL.

View details