Sonic
Fast, lightweight, schema-less search backend in Rust — microsecond queries, 30MB RAM, no document storage required.
Sonic is a high-performance, schema-less search engine designed as a lightweight alternative to Elasticsearch. It runs on as little as 30MB of RAM and responds to queries in microseconds, making it ideal for resource-constrained environments. Built in Rust and powered by RocksDB, Sonic indexes text identifiers rather than full documents, enabling fast search and auto-complete while keeping external databases as the source of truth.
Sonic supports 80+ languages with built-in stopword removal and fuzzy matching, and communicates via a simple TCP-based protocol called Sonic Channel. As of v1.6.0, the project has been restructured into a Cargo workspace separating the core search library from the server binary, and now ships with both Rust integration tests and end-to-end test suites. It offers official client libraries for Node.js, PHP, and Rust, with community libraries for Python, Ruby, and more.
Deployment options include Debian/Ubuntu APT packages, Red Hat Enterprise Linux (RHEL) packages, Docker images, Homebrew (macOS), and direct Cargo installation. The v1.5.0 release added environment-variable-based configuration and new language stopwords, making Sonic an increasingly production-ready choice for teams who need search without Elasticsearch-scale infrastructure.
What You Get
- Microsecond-latency search - Sonic resolves search queries against a compact RocksDB key-value store in microseconds, with benchmarks from Crisp showing half a billion indexed objects served from a $5/month VPS.
- Schema-less identifier indexing - Sonic stores only OID-to-IID mappings and term-to-IID associations — never full document content — so your index stays tiny while your database stays authoritative.
- FST-powered typo correction - A memory-mapped Finite-State Transducer enables Sonic to suggest alternate words when exact matches are insufficient, forgiving user misspellings without a round-trip to an external service.
- Real-time autocomplete - The
SUGGESTSonic Channel command returns live word completions for partial queries, enabling tab-to-expand interfaces with no additional infrastructure. - 80+ language tokenizer - Built-in stopword lists for over 80 spoken languages, plus optional jieba-rs and Lindera/UniDic tokenizers for Chinese and Japanese, ensure clean multilingual indexing without external NLP services.
- Sonic Channel TCP protocol - A lightweight, mode-based TCP protocol (Search, Ingest, Control) eliminates HTTP framing overhead and lets clients pipeline commands at maximum throughput.
- Background FST consolidation tasker - In-memory FST changes are batched and written to disk asynchronously by a supervised background thread, so insertions are immediately searchable without blocking the search path.
- Environment-variable configuration - As of v1.5.0, all TOML config keys can be overridden via environment variables, making containerized and twelve-factor deployments straightforward.
- Cargo workspace architecture - v1.6.0 separated core search logic into a reusable
soniclibrary crate, enabling downstream Rust projects to embed Sonic’s lexer, query builder, and store primitives directly.
Common Use Cases
- Helpdesk article search at scale - Crisp uses Sonic to power full-text search across half a billion helpdesk articles, messages, and contacts, running entirely on a single $5/month cloud server with sub-millisecond response times.
- SaaS product or content search - Teams building multi-tenant SaaS applications use Sonic’s bucket-per-user isolation to give each customer a scoped search index without standing up separate Elasticsearch clusters.
- Edge and embedded search - IoT gateways, edge proxies, and single-board computers use Sonic where Elasticsearch is prohibitively heavy, leveraging its 30MB RAM baseline and single-binary static deployment.
- User-generated content indexing - Platforms with millions of comments, forum posts, or reviews use Sonic’s push/pop API to maintain a real-time search index synchronized with their primary database, querying by object ID.
- Autocomplete-as-a-service - Applications that need snappy type-ahead suggestions wire the Sonic Channel
SUGGESTcommand into their frontend, getting language-aware word completions without building a separate suggestion service.
Under The Hood
Architecture
Sonic is organized as a Cargo workspace with two crates: a core library exposing search primitives — executor, lexer, query builder, and dual-store (KV + FST) — and a server binary that adds the Sonic Channel TCP layer and background tasker. Data flows strictly from TCP socket through command parsing into the executor, which coordinates reads and writes across RocksDB and the FST word graph. Writes to the KV store are immediate while FST mutations are buffered in-memory and flushed asynchronously by a supervised background thread that restarts itself on crash. Thread management uses a spawn_managed_thread pattern that detects panics and re-spawns workers with a brief backoff, ensuring the server self-heals without operator intervention. Configuration is distributed as an immutable Arc<Config> threaded through all layers, eliminating global mutable state while enabling runtime configuration access from every subsystem.
Tech Stack
The server is written in Rust 2024 edition and uses RocksDB (via the rocksdb crate) with Zstd compression for its persistent key-value store, and the fst crate for memory-mapped Finite-State Transducer operations. Text tokenization is handled by unicode-segmentation for UAX-29 word boundaries, whatlang for language detection, and optional jieba-rs and Lindera+UniDic for Chinese and Japanese segmentation. Object and term identifiers are stored as compact 32-bit hashes computed by twox-hash. The CLI uses clap v4, logging via tracing, and jemalloc as an optional high-performance allocator. Deployment targets include Docker (distroless base images), Debian/Ubuntu APT packages, RHEL packages, Homebrew, and Cargo install — all producing a single self-contained binary.
Code Quality
With v1.6.0, Sonic introduced Rust integration tests alongside the pre-existing Node.js end-to-end suite, giving the project both unit-level and black-box coverage for the first time. Both crates enforce strict Clippy lints via #![deny(clippy::all)] with documented style-preference exceptions. Error handling in the channel layer uses exhaustive typed enums (ChannelHandleError) matched at every call site. Inline comments throughout the codebase explain non-obvious rationale — for example, why the FST suggestion loop fetches one extra result beyond the configured limit, and why buffer overflow in the TCP handler deliberately panics to trigger the managed-thread restart. The codebase is comprehensively documented with a dedicated INNER_WORKINGS.md, a formal PROTOCOL.md, and a CONFIGURATION.md alongside the main README.
What Makes It Unique Sonic’s defining technical choice is operating as an identifier index rather than a document store. Every indexed object is compressed from a user-provided OID string to a 32-bit IID via XxHash, and the search result payload is simply a list of IIDs resolved back to OIDs — the actual document content lives only in the caller’s database. This design yields extraordinary storage compactness while keeping Sonic stateless with respect to document schema. The pairing of a mutable RocksDB KV store with an immutable-but-rebuiltable FST per bucket is a pragmatic solution to the FST immutability problem: writes land in KV immediately and are eventually consolidated into the FST by the background tasker, giving users sub-millisecond autocomplete without sacrificing write throughput. The language-detection hybrid — stopword counting for long texts, n-gram for short texts — enables accurate multilingual tokenization with no external calls, and the optional CJK tokenizers are compiled in as feature flags rather than runtime dependencies.
Self-Hosting
Sonic is released under the Mozilla Public License 2.0 (MPL-2.0), a weak copyleft license that operates at the file level rather than the project level. In practice this means you can freely use, deploy, and integrate Sonic into proprietary applications without any obligation to open-source your own code. The only requirement is that if you modify Sonic’s own source files — the .rs files in this repository — those modifications must be made available under the MPL-2.0 when you distribute the modified binary. Commercial use is fully permitted with no royalties or license fees, and there is no separate commercial or enterprise license tier.
Running Sonic yourself means owning the full operational stack: provisioning the server or container, managing RocksDB data directory backups, monitoring the TCP port, and applying binary updates manually when new versions are released. Sonic has no built-in replication, high-availability clustering, or automatic failover — it is a single-process server, so redundancy requires you to implement it at the infrastructure level (e.g., active-passive failover with shared or replicated storage). Disk space grows proportionally with index size, and the FST consolidation background task performs periodic heavy I/O that should be accounted for in capacity planning. The maintainer recommends SSD storage to minimize consolidation latency.
There is no hosted or managed version of Sonic, no SaaS tier, and no paid support offering from the core maintainers. The project is maintained primarily by a single developer with occasional community contributions. This means you are responsible for staying current with security patches, tracking GitHub releases, and diagnosing any issues yourself via the GitHub issue tracker or community channels. Teams requiring SLAs, managed upgrades, HA out of the box, or enterprise support contracts should evaluate Elasticsearch Service (Elastic Cloud), OpenSearch Service (AWS), or Typesense Cloud instead, all of which offer hosted tiers with guaranteed availability and professional support.
Related Apps
Supabase
Developer Tools · Databases · Search
The open-source Postgres development platform that replaces Firebase with authentication, real-time APIs, edge functions, storage, and vector embeddings — all built on PostgreSQL.
Supabase
Apache 2.0OpenBB
Databases · Analytics · Invoicing Finance
The AI Workspace for Finance: Connect Data, Run AI Agents, Build Analytics
OpenBB
OtherNocoDB
No Code Platforms · Databases · Low Code Platforms
Turn any SQL database into a collaborative no-code spreadsheet with automatic REST APIs and real-time views.