Jitsu

Open-source, fully-scriptable data ingestion engine that streams events from web, apps, and APIs to any data warehouse in real time.

4.7Kstars
344forks
MIT License
TypeScript

Jitsu is a self-hosted, open-source alternative to Segment built for engineering teams who need full ownership of their data pipelines. It captures behavioral events from websites, mobile apps, and backend APIs and streams them in real time to data warehouses including ClickHouse, BigQuery, Snowflake, Redshift, and PostgreSQL — without vendor lock-in or per-event pricing surprises.

At the core of Jitsu is a JavaScript-based transformation runtime called Jitsu Functions, which lets you filter, enrich, route, and reshape events using custom code before they land in storage. Functions run in an isolated Deno sandbox, can import npm packages, and have access to key-value storage and HTTP fetch — making them powerful enough to handle real-world enrichment pipelines without external tooling.

Jitsu ships as a Docker Compose stack with a management console, an event router (Rotor), and optionally a bundled ClickHouse instance. It is natively compatible with the Segment analytics.js API and HTTP tracking API, so teams can migrate from Segment by changing a single URL. Automatic user identity stitching builds real-time identity graphs by merging anonymous and identified sessions without requiring SQL joins.

The project is actively maintained with frequent releases, a Slack community, and comprehensive documentation at docs.jitsu.com. It is built on top of Bulker, an open-source data warehouse ingestion engine that can also be used standalone for batch and streaming loads.

What You Get

  • Jitsu Functions runtime - Write JavaScript or TypeScript functions that run in isolated Deno sandboxes to transform, filter, enrich, or fan-out events before they reach any destination — with full access to npm packages, HTTP fetch, and a built-in key-value store.
  • Real-time event streaming to warehouses - Stream behavioral events from websites, mobile apps, and APIs directly to ClickHouse, BigQuery, Snowflake, Redshift, PostgreSQL, and more with sub-second latency via the Bulker ingestion engine.
  • Segment API compatibility - Drop-in replacement for Segment’s analytics.js snippet and HTTP Tracking API, enabling zero-code migration by switching a single endpoint URL in your existing instrumentation.
  • Automatic user identity stitching - Real-time identity graph construction that merges anonymous pre-login sessions with identified post-login events, eliminating the need for complex SQL joins to reconstruct user journeys.
  • Bundled ClickHouse instance - Ships with a pre-configured ClickHouse database as the default storage backend, giving self-hosted deployments a fast, cost-effective analytics store with no additional infrastructure setup.
  • Custom domain support - Deploy the event collection endpoint on your own subdomain (e.g., data.yourcompany.com) to bypass browser ad-blockers and tracker blockers, maximizing event capture completeness.
  • Multi-workspace management console - Next.js-based management UI for creating workspaces, configuring sources and destinations, monitoring event streams, and managing API keys — all from a single interface.

Common Use Cases

  • Replacing Segment without rewriting instrumentation - A SaaS company points its existing Segment analytics.js integration at the Jitsu ingest URL and immediately gains self-hosted control without touching a single tracking call.
  • Building a real-time product analytics warehouse - A product team streams page views, feature interaction events, and conversion signals from their web app into ClickHouse, then queries them directly in Metabase for daily active user dashboards.
  • Event enrichment before warehouse storage - A data engineering team writes Jitsu Functions to append geolocation data from MaxMind, normalize UTM parameters, and drop bot traffic before any event hits BigQuery.
  • Compliance-first data collection for regulated industries - A healthcare company self-hosts Jitsu on private infrastructure to ensure no behavioral data ever touches third-party SaaS servers, satisfying HIPAA data residency requirements.
  • Bypassing ad-blockers for accurate attribution - An e-commerce brand deploys Jitsu on a first-party subdomain so that events from privacy-conscious users who block Segment and Google Analytics are still captured reliably.
  • Consolidating data from multiple SDKs into one pipeline - A mobile-first company uses Jitsu’s HTTP API to unify events from its iOS SDK, Android SDK, and web frontend into a single Snowflake schema for cross-platform analysis.

Under The Hood

Architecture Jitsu is structured as a pnpm monorepo orchestrated by Turbo, with clear domain separation across services, libraries, and web applications. The critical data path runs from SDK clients through an ingest layer to Rotor — a Node.js event routing service — which executes user-defined function chains before dispatching events to Bulker for warehouse-optimized batching and delivery. The management console (Next.js) is architecturally decoupled from the event path, communicating with Rotor and Bulker through well-defined HTTP and Kafka interfaces. A shared types package enforces data contracts across service boundaries without coupling implementations. Function execution is sandboxed in Deno Web Workers with locked-down permissions, so user code runs in a separate process context and cannot affect the routing service’s own state. This layered design means any single layer — connector, transformer, or destination adapter — can be replaced or extended without touching the rest of the pipeline.

Tech Stack The monorepo spans TypeScript (55%) and Go (43%), using each language where it excels: TypeScript for the management console (Next.js 15, Tailwind CSS, Ant Design), the Rotor event router, and SDK libraries; Go for Bulker’s high-throughput warehouse ingestion. Prisma with PostgreSQL manages relational state for workspaces, users, connections, and configuration. Event streaming between services uses Kafka via @confluentinc/kafka-javascript. The function runtime compiles user TypeScript with esbuild, bundles it with Node.js built-in polyfills for Deno compatibility, and executes it inside permission-locked Deno Web Workers. ClickHouse serves as the default analytics storage backend. Redis is used for key-value storage accessible from within Jitsu Functions. Authentication supports Firebase, GitHub OAuth, OIDC, and credentials login. The build pipeline uses Turbo for dependency-aware parallel builds with persistent caching.

Code Quality Test coverage spans unit, integration, and end-to-end layers: Vitest for unit and integration tests (functions chain execution, Kafka consumer logic, in-memory store behavior, destination-specific transform tests for Mixpanel, PostHog, HubSpot, Intercom, and Facebook Conversions), and Playwright for browser-based end-to-end scenarios. Error handling is explicit throughout the Rotor service, with structured logging via a juava internal library, named retry error types (NoRetryErrorName, DropRetryErrorName) for precise dead-letter queue semantics, and Prometheus metrics at every processing stage. TypeScript strict mode is enforced across all packages, with Zod used for API route input validation and a zod-prisma generator that derives Zod schemas directly from the Prisma data model. Prettier and ESLint run as pre-commit hooks via Husky, maintaining consistent formatting across both TypeScript and Go packages.

What Makes It Unique Jitsu’s most distinctive technical capability is its user-defined function sandbox: user JavaScript is bundled by esbuild with polyfills for Node.js built-ins (crypto, events, buffer, stream), then deployed as an isolated Deno Web Worker with permissions: "none" — meaning the sandbox has no filesystem, network, or environment access except through explicitly proxied Jitsu APIs. This lets teams run arbitrary npm packages as event transforms without the security risks of eval-based approaches. A second distinctive capability is real-time identity stitching through a dedicated Profile Builder service that maintains user identity graphs across anonymous and authenticated sessions using a combination of Redis and MongoDB, enabling warehouse-ready user-centric schemas without downstream SQL complexity.

Self-Hosting

Jitsu is released under the MIT License, which is one of the most permissive open-source licenses available. You can use it commercially, modify it, redistribute it, and build proprietary products on top of it without any obligation to open-source your own code. There are no copyleft clauses, no contributor license agreement requirements for end users, and no legal restrictions on self-hosting in any industry or geography. The MIT license covers all components in the monorepo — the console, Rotor, SDK libraries, and utility packages.

Running Jitsu yourself requires a meaningful infrastructure footprint. The full stack depends on PostgreSQL (configuration store), Redis (function key-value store, caching), Kafka (event streaming between ingest and Rotor), ClickHouse (default analytics warehouse), and optionally MongoDB (profile builder). A Docker Compose file is provided for local and small-scale deployments, but production deployments call for separate managed instances of each dependency, appropriate disk provisioning for ClickHouse write volumes, and monitoring for Kafka consumer lag and Rotor processing throughput. The team is responsible for all upgrades, database migrations (Prisma schema changes), backups, and high-availability configuration. The Jitsu codebase is actively developed at a high release cadence — over 100 tagged releases in the observed window — which means self-hosted deployments need a disciplined upgrade process to stay current.

Jitsu Cloud (use.jitsu.com) offers a hosted tier that is free up to 200,000 events per month and includes a managed ClickHouse instance, so small teams can start without any infrastructure. Beyond the free tier, the cloud product adds managed upgrades, SLA-backed uptime, support channels, and enterprise billing. Self-hosters give up these operational guarantees in exchange for data residency control, unlimited event volume (bounded only by their own hardware), and zero per-event costs. For teams whose primary concern is compliance or cost at scale, self-hosting is genuinely viable — but it requires treating Jitsu as a production service with all the associated on-call and maintenance obligations.

Join founders buildingwith open source

Weekly curated picks, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 500+ subscribers
New issue every Thursday

Search