Transform messy, unstructured documents into persistent, navigable memory that AI agents can actually use.
Knowhere is an open-source document memory infrastructure stack that sits between raw files and AI agents. Instead of dumping flat text at an LLM, Knowhere ingests PDFs, Office files, images, and markdown, then reconstructs their full hierarchical structure using a proprietary tree-building algorithm — preserving headings, sections, tables, and cross-document relationships as a navigable memory graph.
The platform runs in two stages: build and retrieve. During the build phase, documents are routed to specialized parsers (defaulting to MinerU for PDFs), then a multi-pass document agent using a ReAct-style loop analyzes page anatomy, detects table-of-contents structures, assigns heading levels, and organizes chunks with full section-path context. During retrieval, a hybrid engine fuses keyword (BM25), path, semantic, and vector channels using Reciprocal Rank Fusion, then an LLM-driven navigation agent walks the section tree to drill into the most relevant regions.
Knowhere exposes its retrieval engine as an MCP (Model Context Protocol) server, making it natively compatible with Claude, Cursor, and other agentic tool frameworks. Every result carries traceable source paths — document, section, chunk, and linked assets — so downstream agents can cite evidence rather than hallucinate. Internal benchmarks show +36% first-try accuracy and +11% recall over feeding raw documents directly to agents.
The full stack is self-hostable via Docker Compose, with official Python and Node.js SDKs available for cloud API access. A companion dashboard, worker service, and shared infrastructure package ship as separate repositories in the Ontos-AI ecosystem, all orchestrated by a uv workspace.
Architecture
Knowhere follows a distributed, service-oriented architecture split across three deployable units sharing a common Python package. The apps/api FastAPI service handles all HTTP routing, authentication, rate limiting, billing, webhooks, and exposes the MCP retrieval endpoint; it runs Alembic migrations on startup and warms a PostgreSQL async connection pool via asyncpg. The apps/worker Celery/gevent service owns all CPU-bound document processing — ingestion orchestration, parsing, structural analysis, and the document profile agent — and is monkey-patched with gevent for cooperative scheduling. The packages/shared-python package contains all shared models, database sessions, Redis clients, retrieval services, storage adapters, and chunk structures. The retrieval path is entirely in shared and called from both API and the MCP server. Separation of concerns is enforced structurally: routes are thin adapters over typed workflow outcomes (documented in ADR-0001 and ADR-0002), the worker uses a per-job state gate and billing guard before any compute runs, and the agentic retrieval orchestrator is policy-explicit with configurable budget envelopes per stage (ADR-0003).
Tech Stack
The project uses Python 3.11+ throughout, managed as a uv workspace with three packages. The API is built on FastAPI 0.135 + Uvicorn 0.34 + SQLAlchemy 2.0 (async), backed by PostgreSQL with pgvector for vector storage and Redis 5 for caching, rate limiting, and Celery task queuing via celery-redbeat. Document parsing leans on MinerU as the default PDF backend, with python-docx, python-pptx, pypdf, pymupdf, pptx2md, openpyxl, and markitdown for other formats; pandas handles the intermediate dataframe representation of chunk/heading data. The document profile agent uses the OpenAI SDK (model-agnostic via env vars — supports DeepSeek, Qwen-VL, GPT, Zhipu, Volcengine). The MCP server is built on FastMCP from the official mcp package (1.27+). Observability is wired through Logfire with optional PostHog telemetry for self-hosted deployments, and Stripe handles cloud billing. Type checking uses Pyright in basic mode; linting uses Ruff.
Code Quality
The codebase has extensive contract test coverage — 43 test files split across apps/api/tests/contract/ and apps/worker/tests/contract/, covering agentic discovery, retrieval, billing, API key auth, document lifecycle, worker bootstrap, parse task execution, and more. Tests use pytest with pytest-asyncio, pytest-alembic, fakeredis, and pytest-postgresql. Three Architecture Decision Records document structural invariants. The codebase uses Pydantic v2 for all data validation, typed workflow outcome enums throughout the worker pipeline, and Pyright in basic mode with Ruff for linting. Gevent monkey-patching at the worker entry point is explicitly documented and isolated. Error handling is explicit with loguru structured logging at every stage gate. Some areas lack unit tests (retrieval channel scoring, heading tree logic) but the contract test surface is broad.
What Makes It Unique
Knowhere’s primary technical differentiator is that it treats document hierarchy as a first-class data structure rather than an afterthought. The proprietary tree-building algorithm reconstructs heading levels and section paths from raw parser output using a stack-based parent-child traversal, then propagates these paths into every chunk’s metadata — so a chunk in “Chapter 3 / Section 2.1 / Subsection a” carries its full lineage. The agentic retrieval orchestrator then navigates this tree with an LLM observe-act loop (EXPAND/BACK/FINISH actions), simulating how a human reader drills into relevant sections rather than trusting flat cosine similarity. The MCP server integration is first-class and stateless-HTTP — not a bolt-on — making Knowhere natively consumable by Claude Code, Cursor, and any MCP-capable agent without a custom integration layer.
Licensing Model Apache 2.0 licensed — all features available in self-hosted deployments with no restrictions or license keys required.
Self-Hosting The full stack (API, worker, dashboard) ships as a Docker Compose configuration in the separate knowhere-self-hosted repository. No feature is gated behind a cloud plan in the self-hosted path.
Cloud vs Self-Hosted Knowhere Cloud at knowhereto.ai offers a managed API with $5 free credits on registration. The cloud offering provides the same capabilities as self-hosted; the difference is operational (managed infrastructure vs. bring your own).
No Code Platforms · AI Development · Developer Tools
Visual LLM workflow platform with RAG pipelines, agent capabilities, and model management for building production AI applications.
Developer Tools · Game Development · Design Tools
Free, MIT-licensed 2D and 3D game engine with one-click multi-platform export and no royalties.
Developer Tools · Databases · Search
The open-source Postgres development platform that replaces Firebase with authentication, real-time APIs, edge functions, storage, and vector embeddings — all built on PostgreSQL.