Drop your files and search them instantly — no vector DB, no indexing pipeline, just raw data queried by a self-evolving intelligence layer.
Sirchmunk is an embedding-free, indexless retrieval system that transforms how AI agents and developers search across large, heterogeneous document collections. Rather than pre-processing documents into fixed-dimensional vector representations, Sirchmunk queries raw files directly using ripgrep-all under the hood and then applies Monte Carlo evidence sampling to identify the most relevant regions of interest within each file. This eliminates hours-long indexing pipelines and removes the need for a vector database entirely.
At the core of Sirchmunk is a self-evolving knowledge base. As the system processes queries, it builds and refines knowledge clusters — structured groupings of evidence units organized by abstraction level (technique, principle, paradigm) and lifecycle state (emerging, stable, contested). These clusters persist and grow across sessions, so retrieval quality improves over time without requiring a full re-index when data changes.
Sirchmunk supports two main search modes — FAST for rapid file-level retrieval and DEEP for multi-pass evidence synthesis — along with an offline compile command introduced in v0.0.8 that pre-builds hierarchical tree indices and knowledge clusters for even greater precision. The system exposes a FastAPI backend, a Next.js web UI, a CLI, and an MCP server, making it composable within multi-agent workflows and existing developer toolchains.
The project targets developers, researchers, and AI engineers who need immediate, high-fidelity access to large local document repositories or mixed-format corpora without committing to a heavyweight vector infrastructure. It supports Python 3.10+ and ships as a pip package with Docker images for amd64 and arm64 architectures.
sirchmunk init, search, serve, compile, web serve, mcp serve) for zero-config local deployment or headless server operationsirchmunk compile) that builds tree indices and knowledge clusters from documents ahead of time, with --lint health checks and --fix auto-repairallowed_paths enforcement, per-IP rate limiting, and audit logging for secure team-wide document searchArchitecture
Sirchmunk is organized as a layered system with clearly separated concerns: a retrieval layer (GrepRetriever wrapping ripgrep-all), an evidence layer (MonteCarloEvidenceSampling and EvidenceUnit), a knowledge layer (KnowledgeBase managing KnowledgeCluster and AbstractionLevel hierarchies), and a presentation layer (FastAPI routes, Next.js UI, CLI, MCP server). Execution flows from user query through intent detection, to multi-granularity keyword extraction, to parallel ripgrep-all invocations, to LLM-scored Monte Carlo sampling windows, to knowledge cluster synthesis and persistence in DuckDB. The agentic module adds a ReAct-loop agent that can invoke directory scanning tools autonomously. The compile pipeline is a separate orchestration path that fuses tree indexing and knowledge compilation offline, with artifacts detected and used transparently by the search pipeline at runtime. Separation of concerns is solid across the storage, retrieval, learnings, and API packages, though the central search.py carries a high concentration of orchestration logic spanning both FAST and DEEP modes.
Tech Stack
The backend is Python 3.10+ built on FastAPI with uvicorn, exposing REST and SSE streaming endpoints. Document extraction is handled by Kreuzberg (a multi-format extraction library) augmented with system-level tools — poppler for PDFs, pandoc for Office formats, tesseract for OCR, and ffmpeg for audio. Search is powered by ripgrep-all, invoked via subprocess with an asyncio semaphore for concurrency control. Knowledge persistence uses DuckDB for OLAP-style storage and msgpack for cache serialization. Embedding support is optional via sentence-transformers (ModelScope-hosted or local). The LLM integration is provider-agnostic through an OpenAI-compatible client with a _ProviderProfile abstraction that auto-detects provider from base URL and supports streaming and thinking_content. The frontend is Next.js 14 with Tailwind CSS 3.4, bundled as static assets embedded in the Python package for single-process deployment. The MCP server is a separate sirchmunk_mcp package using the MCP Python SDK.
Code Quality The codebase applies loguru for structured logging with an async callback system, and uses Python dataclasses extensively for schema definitions (EvidenceUnit, KnowledgeCluster, SampleWindow, RoiResult). Type annotations are present but inconsistent — public APIs use Union and Optional annotations while internal helpers sometimes omit them. Error handling is explicit in the API layer with FastAPI HTTPException patterns and hmac.compare_digest for secure token comparison. A pytest + pytest-asyncio test infrastructure is configured in requirements, though no test files were found in the cloned repository, indicating testing may be handled in a separate location or is limited. CI runs Docker multi-arch builds via GitHub Actions workflows for publish and image building. Code comments and docstrings are present across core modules and retrieval classes, providing moderate inline documentation density.
What Makes It Unique Sirchmunk’s central innovation is the combination of embedding-free retrieval with adaptive Monte Carlo evidence sampling — rather than approximating similarity in a high-dimensional vector space, it uses ripgrep-all to enumerate candidate files deterministically and then applies a multi-round sampling strategy to score regions within large documents using an LLM as a relevance judge. The self-evolving knowledge cluster store is architecturally distinctive: clusters carry lifecycle states (emerging, stable, contested, deprecated) and abstraction levels (technique through philosophy), enabling the system to reason about the epistemic status of accumulated evidence rather than treating all retrieved text as equally reliable. The offline compile path that produces tree indices compatible with runtime retrieval — with graceful fallback when absent — is a practical design that separates the expensive pre-processing from the interactive search path without requiring users to choose one mode permanently.
Sirchmunk is released under the Apache License 2.0, one of the most permissive open-source licenses available. You can use it commercially, modify the source, distribute modified versions, and sublicense it without any copyleft obligations. The only requirements are attribution (preserving copyright notices) and including the license text in distributions. There are no open-core restrictions, no feature gating tied to a commercial tier, and no usage telemetry baked into the codebase.
Running Sirchmunk yourself requires a Python 3.10+ environment and an LLM API key from an OpenAI-compatible provider (OpenAI, DeepSeek, MiniMax, Groq, etc.) — Sirchmunk itself is stateless with respect to the LLM; it calls your chosen provider’s API for keyword extraction, evidence scoring, and synthesis. For full document support (PDF, PPTX, DOCX, images, audio), the Dockerfile installs system-level dependencies including poppler-utils, pandoc, tesseract-ocr, and ffmpeg. You are responsible for provisioning and maintaining the host environment, keeping system packages up to date, managing your LLM API costs, and deciding how to back up the .sirchmunk/ working directory where knowledge clusters and cache are persisted.
There is no hosted or managed cloud version of Sirchmunk. You give up managed uptime, automatic upgrades, and any support SLA that a commercial product might offer. On the other hand, all data stays entirely on your infrastructure — documents never leave your environment, which is a significant advantage for sensitive or proprietary corpora. The project is actively maintained with a release cadence of roughly every two to four weeks and a community WeChat/DingTalk group, but production-grade support, high-availability configurations, and guaranteed response times are self-managed responsibilities.
No Code Platforms · AI Development · Developer Tools
Visual LLM workflow platform with RAG pipelines, agent capabilities, and model management for building production AI applications.
AI Code Assistants · AI Development
Orchestrate an army of AI coding agents—Claude Code, Codex, Gemini CLI, and more—running simultaneously in isolated git worktrees from a single Electron desktop app.
AI Code Assistants · AI Development
The self-hosted developer control center for running AI coding agents — locally, in Docker, on VMs, or across cloud backends — with automation workflows for GitHub, Slack, and more.