Volga

A Rust-based real-time data processing engine for AI/ML feature computation, built on Apache DataFusion and Arrow — positioned as an alternative to Flink, Spark, Chronon, and OpenMLDB with unified streaming, batch, and request-time execution.

154stars
10forks
Apache License 2.0
Rust

Volga targets a specific pain point in modern AI/ML systems (recommendation engines, fraud detection, personalization, search, RAG): features need to be computed consistently across streaming, batch, and request-time contexts, which typically requires stitching together separate systems like Flink for streaming and Spark for batch, plus AI/ML-specific tools like Airbnb’s Chronon or OpenMLDB for feature serving.

Volga instead aims to unify these execution modes in one engine, built in Rust on top of Apache DataFusion (query execution) and Apache Arrow (columnar data format), specializing specifically in continuous window aggregations — a common and often awkward-to-implement pattern in feature engineering pipelines.

Apache-2.0 licensed, Volga is an early-stage project (per its own GitHub activity metrics) documented in depth on the project’s Substack blog, explaining the rationale as a considered rewrite/rethink of existing streaming and feature-engineering infrastructure rather than an incremental tweak.

What You Get

  • A single engine for streaming, batch, and request-time feature computation instead of stitching together separate systems
  • Specialized support for continuous window aggregations, a common but awkward feature-engineering pattern
  • Built on Apache DataFusion for query execution and Apache Arrow for columnar data representation
  • SQL as a query interface for defining feature computation logic

Common Use Cases

  • Computing ML features consistently across streaming, batch, and request-time contexts for recommendation or fraud-detection systems
  • Replacing a Flink+Spark+feature-store stack with one unified engine for AI/ML data pipelines
  • Running continuous window aggregations for real-time personalization or search ranking features
  • Building RAG or search systems that need features computed consistently between offline training and online serving

Under The Hood

Architecture Volga is built on Apache DataFusion for its query execution engine and Apache Arrow for in-memory columnar data representation, rather than implementing a custom execution engine from scratch — leveraging DataFusion’s SQL query planning and Arrow’s efficient columnar operations as a foundation. The unified streaming/batch/request-time execution model is the core architectural bet: rather than three separate systems each computing features differently, Volga aims for one execution semantics applied consistently across all three contexts.

Tech Stack Rust for the core engine, Apache DataFusion for SQL query execution, and Apache Arrow for columnar data handling — the same foundational technologies used by several modern data engines, applied here specifically to the AI/ML feature-computation use case.

Code Quality The project documents its design rationale extensively on a dedicated Substack blog rather than relying solely on README claims, providing more context for evaluating its architectural choices; GitHub activity metrics show the project is still early-stage with somewhat inconsistent maintenance cadence typical of a young infrastructure project.

What Makes It Unique Most teams solve the streaming/batch/request-time feature-consistency problem by combining multiple specialized systems (Flink, Spark, Chronon, OpenMLDB); Volga’s bet is that a single Rust engine built on DataFusion and Arrow can unify all three execution modes with consistent semantics, avoiding the operational and consistency overhead of running several different systems for the same underlying feature-computation problem.

Self-Hosting

Licensing Model Apache-2.0 licensed — fully open source with no license key.

Self-Hosting Restrictions Not applicable; it’s a self-hosted data processing engine you deploy within your own infrastructure.

License Key Required No.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search