helix-db

Name: helix-db
Rating: 5 (5557 reviews)

A graph-vector database built from scratch in Rust that unifies graph traversal, vector search, key-value, and relational storage into a single platform for AI applications.

5.6Kstars

308forks

Apache License 2.0

Rust

View Source Visit Website

On This Page

HelixDB is an OLTP graph-vector database written from scratch in Rust, designed to give AI agents and RAG pipelines a single place to store and query all their data. Instead of stitching together a separate vector store, graph database, relational database, and application cache, HelixDB exposes them as one coherent data model accessed through a fluent query DSL available in Rust, TypeScript, Go, and Python.

The project started in November 2024 and has already shipped over 100 releases, reaching v3 in mid-2025. It operates with a primary graph-plus-vector model — nodes and edges carry typed properties, and any property can be a vector index — while also supporting key-value and document access patterns within the same transaction boundary. Queries are sent as dynamic JSON ASTs over a single HTTP endpoint, meaning no query compilation step is needed during development.

HelixDB includes a CLI called helix that manages local Docker-based instances, connects to HelixDB Cloud clusters, and even runs an interactive bootstrapper (helix chef) that scaffolds a full project including schema, seed data, and a Next.js frontend by handing off to an AI coding agent. The CLI handles the full local development lifecycle: starting, stopping, pruning containers, syncing project configurations, and managing cloud authentication.

The managed HelixDB Cloud offering adds object-storage-backed persistence, ACID transactions, single-writer with auto-scaling reader nodes, and multi-gateway high availability. Local development uses Docker or Podman with optional MinIO-backed disk persistence, making the self-hosted workflow realistic for evaluation but operationally non-trivial for production use.

What You Get

A unified graph-vector engine that stores nodes, edges, and high-dimensional vectors in a single system with consistent ACID transaction boundaries
A fluent query DSL available in Rust, TypeScript, Go, and Python that compiles to a portable JSON AST and runs as dynamic POST requests — no build step required
A helix CLI that starts local Docker or Podman instances, manages disk or in-memory storage, syncs cloud cluster configurations, and runs an AI-assisted project bootstrapper
Multi-model data access — graph traversal, approximate nearest-neighbor vector search, key-value lookups, and document patterns all available within the same write or read batch
Parameterized, type-safe queries through #[register]-annotated Rust functions or TypeScript builder functions that can be used as stored queries or sent dynamically
Vector index management on both node and edge properties, with distance metadata available in projections and value maps after search

Common Use Cases

AI agent memory — storing and retrieving an agent’s interaction history, user preferences, and contextual knowledge as a graph with semantic vector recall
RAG knowledge graphs — indexing document chunks as nodes with embedding vectors and connecting them through relationship edges for hybrid graph-plus-vector retrieval
Company knowledge bases — federating structured company data, documents, and team relationships into a single queryable graph that agents can traverse
Recommendation systems — combining graph relationships between users, items, and categories with vector similarity to surface personalized results without separate infrastructure
Multi-modal data applications — building apps that need graph traversal for social or organizational relationships alongside vector search for semantic similarity in one transaction
Local-first AI tooling — prototyping AI-powered applications on a local Docker instance with the same SDK and query model used in production cloud deployments

Under The Hood

Architecture HelixDB’s public repository exposes a layered, modular architecture with clear separation between the CLI management layer, the multi-language SDK layer, and the server runtime (distributed as a container image). The CLI is organized as a flat command module registry built on clap, with each subcommand in its own file and shared infrastructure — project context, config management, local runtime orchestration, cloud API client, SSE streaming client — extracted into dedicated modules. The SDK follows a builder pattern centered on two orthogonal abstractions: batch type (read vs. write) and traversal (the graph step chain). These compose cleanly, and the DSL module is extensively documented with doctests that double as usage examples. The primary extension mechanism — the proc macro — adds a thin transformation layer that converts typed Rust functions into serializable DynamicQueryRequest payloads, enabling the same function signature to work as both a dynamic inline query and a deployable stored query. The overall design favors explicit composition over magic: data flows from batch builder through named variable slots to a returning clause, with no hidden state.

Tech Stack The CLI and Rust SDK are built on stable Rust (2024 edition for the CLI, 2021 for the SDK) with tokio for async I/O and reqwest for HTTP transport. The CLI uses clap with derive macros for argument parsing, cliclack for interactive terminal prompts, indicatif for progress bars, and self_update for in-place binary updates. The local runtime management uses standard process spawning to drive Docker or Podman. Serialization uses serde with sonic-rs (a SIMD-accelerated JSON library) instead of serde_json for the SDK, which is a deliberate performance choice. The TypeScript SDK is a Node.js package using the Fetch API. The Python SDK uses a pip-installable package with snake_case builder parity. The server runtime is distributed as a container image and includes MinIO as optional S3-compatible disk storage; no database source code is in the open repository.

Code Quality The Rust SDK has comprehensive inline documentation with doc-comments on every public type, method, and module, and a substantial integration test suite covering the DSL AST serialization contract, predicate variants, parameter type coercion, and HTTP routing — including a real one-shot TCP server to verify actual request paths without mocking. The CLI has integration tests using assert_cmd against the compiled binary. Error handling is explicit throughout: the enum covers transport errors, remote errors, serialization errors, and URL parsing errors with typed variants and thiserror derive macros. The codebase uses color-eyre for rich error display in the CLI. The Rust workspace has a clippy_check.sh script and the release profile uses LTO and codegen-units=1 for optimized binaries. No test coverage gaps are apparent in the SDK; the CLI tests cover command-level behavior.

What Makes It Unique HelixDB’s most technically distinctive decision is the unified graph-vector-KV-relational storage model exposed through a single query batch abstraction rather than requiring users to context-switch between query languages or databases. Most vector databases treat graph relationships as a secondary concern (metadata filtering), and most graph databases treat vector search as an optional plugin. HelixDB treats both as first-class primitives within the same traversal chain — a query can traverse graph edges and then execute a nearest-neighbor vector search in the same batch, with distance metadata flowing through subsequent projection steps. The dynamic query protocol — where queries are serialized as a JSON AST and POSTed to a single endpoint — means there is no compile step, no schema migration for query changes, and the same DSL works identically against local and cloud instances. The command, which hands off to an AI coding agent with a complete project scaffold and installed skills, is a genuinely novel distribution mechanism for developer tooling.

Self-Hosting

HelixDB is released under the Apache License 2.0, a permissive open-source license that permits commercial use, modification, distribution, and private use without copyleft obligations. You can embed it in commercial products, modify the source, and distribute it without needing to open-source your own application code. There are no field-of-use restrictions and no dual-licensing surprises.

Running HelixDB locally requires Docker or Podman — the helix start dev command pulls a container image and manages its lifecycle for you. By default, local instances use in-memory storage, meaning data is wiped when the container stops; adding the --disk flag enables MinIO-backed persistence, which spins up a second container for object storage. This makes the local development experience smooth, but production self-hosting would require you to provision and maintain Docker infrastructure, manage MinIO or compatible S3 storage, handle upgrades by pulling new images, and implement your own monitoring, alerting, and backup strategy. The database engine itself is not open-sourced — what is available in this repository is the CLI tooling and SDKs; the server runs as a container image pulled from a registry.

HelixDB Cloud, the managed offering, adds several capabilities that are difficult to replicate self-hosted: object-storage-backed durability, full ACID transactions, a single-writer node with auto-scaling read replicas, multi-gateway high availability (3+ nodes), managed cluster upgrades, and authenticated API key management. Self-hosters trade these operational guarantees for full data control and no per-query fees. The cloud tier is positioned as the production path, with local instances intended primarily for development and evaluation rather than serving production traffic.

On This Page

Repository Health

Pre-computed score based on development activity, maintenance, community, maturity, and trend momentum.

83/100Excellent

Development Activity96

Maintenance100

Community56

Maturity40

Momentum40

Very active developmentWell-maintained with consistent updatesRapidly growing project

Technical Analysis

80/100Excellent

Architecture80

Code Quality82

Innovation88

Learning Curve70

Repository Stats

Contributors

Total Commits

2,747

Monthly Commits

Watchers

Repo Age

1.6 years

Last Commit

4 days ago

Built With

Rust66.8%

TypeScript14.2%

Python10.2%

Recent Releases

100 total

~5.2 releases/month

Alternative To

Tigergraph Pinecone

Topics

ai cli database databases graph-database helix helixdb rag rust rust-crate rust-lang vector

Related Apps

TypeScript

71%

Apache 2.0

Supabase

Developer Tools · Databases · Search

105,714

The open-source Postgres development platform that replaces Firebase with authentication, real-time APIs, edge functions, storage, and vector embeddings — all built on PostgreSQL.

View details