Timeplus Proton is a C++-based, single-binary SQL engine designed for high-throughput, low-latency stream processing and real-time analytics. It eliminates the complexity of JVM-based systems like Apache Flink or ksqlDB by offering native SQL support for Kafka, Redpanda, ClickHouse, and other data sources with no external dependencies. Built for developers and data engineers who need to build real-time pipelines without infrastructure overhead, it solves the problem of slow, fragmented streaming ETL workflows by unifying ingestion, transformation, and materialized view storage in one lightweight process.
Technically, Proton leverages ClickHouse’s vectorized query engine and SIMD optimizations to achieve 90M EPS and 4ms end-to-end latency. It supports external streams for Kafka/Redpanda, external tables for ClickHouse/Postgres, incremental materialized views, windowed aggregations, and UDFs in Python/JS. Deployable as a standalone binary, Docker container, or via Homebrew, it runs on minimal resources—down to a t2.nano instance—and integrates with REST APIs, Python/Go/Java SDKs, and BI tools like Grafana.
What You Get
- Single C++ Binary - A standalone <500MB executable with no JVM, ZooKeeper, or external dependencies, enabling deployment on edge devices and minimal cloud instances like t2.nano.
- Native Kafka/Redpanda Integration - Direct SQL access to live Kafka and Redpanda streams via CREATE EXTERNAL STREAM, with SASL_SSL, IAM, and TLS support without custom connectors.
- Incremental Materialized Views - Real-time aggregations (e.g., tumbling windows) that automatically update and persist query results to downstream systems like ClickHouse or Kafka topics.
- External Tables for ClickHouse, Postgres, MySQL, MongoDB, S3/Iceberg - Federated queries across live streams and historical data stores using standard SQL without data movement.
- Streaming SQL with Windowing and Watermarks - Support for tumble, hop, and session windows, watermarking, and ASOF JOINs to correlate events across time-bound streams.
- Python and JavaScript UDFs - Extend query logic with custom stateless and stateful functions written in Python or JS for complex transformations and alerting rules.
Common Use Cases
- Real-time Telemetry Pipeline with Alerting - A DevOps team uses Timeplus Proton to ingest logs and metrics from Kafka, filter noise in real-time, compute anomaly thresholds, and trigger alerts to Slack or S3 before forwarding to Splunk or Elastic.
- AI/ML Feature Engineering from Live Data - A machine learning engineer builds real-time features (e.g., rolling averages, session durations) from Kafka event streams using SQL window functions and materialized views, feeding directly into model inference pipelines.
- CDC and Denormalization for Operational Systems - A fintech company replaces complex Kafka Connect pipelines with Timeplus Proton to capture PostgreSQL changes, denormalize customer data, and stream enriched records to ClickHouse for analytics with 67% lower overhead.
- Trading Surveillance with Sub-Second Latency - A hedge fund processes 700k EPS of trade data from Redpanda, applies complex rule-based alerts using SQL UDFs, and visualizes risk metrics in Grafana with 4ms end-to-end latency.
Under The Hood
Architecture
- Lacks clear separation of concerns with no discernible layers for services, repositories, or controllers
- No dependency injection or inversion of control patterns; direct class instantiation dominates
- Directory structure is flat and unorganized, resembling a collection of scripts rather than a system
- Configuration files dominate the project structure, suggesting tooling concerns outweigh architectural design
- Absence of established design patterns, with ad-hoc test stubs replacing extensible abstractions
Tech Stack
- Python 3.x forms the core, with custom utilities for interacting with ClickHouse and S3
- Deep integration with ClickHouse as the primary analytical engine, using tailored helpers for query execution
- GitHub Actions and internal Yandex tools automate PR-to-issue synchronization and metadata mapping
- Uncrustify enforces strict C++ formatting rules, indicating presence of native code components
- Lightweight pipelines rely on subprocess and threading rather than heavy frameworks
Code Quality
- Extensive test suite built on YAML-based declarative definitions for complex stream processing scenarios
- Strong focus on integration and end-to-end testing with precise state control via SQL and process manipulation
- Minimal custom error handling, relying on underlying system diagnostics
- Configuration uses structured YAML and XML to separate test logic from deployment settings
- Naming conventions align with stream processing domain terminology, improving readability for experts
What Makes It Unique
- Native use of AggregatingMergeTree enables real-time analytics directly from transactional data
- Event-driven state machine auto-synchronizes user actions with materialized views, eliminating manual ETL
- Dynamic SQL optimizer rewrites queries based on user role and data access patterns
- Permission-aware data masking enforced at the storage layer for compliance
- Extensible Rust-based plugin system integrated directly into the query engine without external dependencies