Local-first AI memory with verbatim storage, pluggable backends, and 96.6% retrieval recall on LongMemEval — no API key required.
MemPalace is a local-first AI memory system that stores conversation history and project content as verbatim text and retrieves it with semantic search. Unlike summarization-based memory tools, MemPalace never paraphrases or extracts — every drawer holds the original content exactly as written, which is the key to its benchmark-leading retrieval accuracy.
The palace uses a spatial metaphor to organize knowledge: people and projects become wings, topics become rooms, and the original verbatim content lives in drawers. Searches can be scoped to a specific wing or room rather than run against a flat corpus, dramatically improving precision. The hybrid search pipeline combines BM25 keyword matching with vector semantic similarity, with closet pointers providing an additional ranking signal.
The retrieval layer is pluggable through a well-defined backend contract (RFC 001). ChromaDB is the default, but alternative backends — SQLite exact-vector, Qdrant (REST), and pgvector (Postgres) — can be swapped in without touching the rest of the system. Embeddings are generated locally using an ONNX-based model (embeddinggemma-300m for multilingual support, or MiniLM for English-only), with hardware acceleration available via CUDA, CoreML, and DirectML.
Beyond file mining, MemPalace includes a temporal entity-relationship knowledge graph backed by SQLite, 33 MCP server tools for integration with Claude Code and other AI tools, auto-save hooks for Claude Code/Codex CLI/Cursor IDE, and multi-agent support where each specialist agent gets its own wing and diary in the palace.
Architecture
MemPalace follows a layered, modular architecture with clear separation between the CLI entry point (cli.py), domain logic, and storage backends. The core flow starts at cli.py which routes to miner.py (project files), convo_miner.py (conversation exports), or format_miner.py (binary documents). All three miners converge on palace.py for shared palace operations — collection access, embedder identity enforcement, closet upserts, and FTS5 validation — which in turn delegates to the pluggable backend registry in backends/. The knowledge graph (knowledge_graph.py) is a self-contained SQLite module that sits orthogonal to the vector store, joined only at the MCP server layer (mcp_server.py). Thread safety is handled by explicit per-file and per-palace mining locks in palace.py, and the MCP server protects stdout at the file-descriptor level before importing chromadb to prevent banner output from corrupting JSON-RPC streams.
Tech Stack
MemPalace is written in Python 3.9+ and distributed as a PyPI package built with Hatchling. The default vector store is ChromaDB 1.5.x; alternative backends (Qdrant via REST, pgvector via psycopg3, SQLite exact-vector) are registered through Python entry points under mempalace.backends. Embeddings are generated locally using ONNX Runtime with two model options: all-MiniLM-L6-v2 (~30 MB, English-only) or onnx-community/embeddinggemma-300m-ONNX (~300 MB, 100+ languages), lazy-downloaded from HuggingFace Hub on first use. Binary document extraction uses MarkItDown with per-format sub-extras for PDF, DOCX, PPTX, and XLSX. Development tooling is ruff 0.15.15 for linting and formatting, pytest with pytest-cov enforcing 85% coverage minimum, hypothesis for property-based testing, mypy for static type checking, and pre-commit for local gate enforcement.
Code Quality
The test suite is comprehensive, with over 100 test files covering individual modules (test_searcher.py, test_miner.py, test_palace.py), backend conformance (_backend_conformance.py run against all four backends), MCP server behavior, hook integration, and even benchmark claim verification (test_readme_claims.py). Error handling is explicit and typed throughout — the backends module defines a rich error hierarchy (BackendError, PalaceNotFoundError, CollectionNotInitializedError, DimensionMismatchError, UnsupportedCapabilityError) and callers distinguish between them rather than catching broadly. Inline comment density is very high, with detailed rationale comments in pyproject.toml, palace.py, and miner.py explaining non-obvious decisions. The codebase enforces a coverage floor of 85% in [tool.coverage.report] and CI gates on ruff.
What Makes It Unique Most AI memory systems summarize or extract facts before storage, which introduces irreversible information loss and shifts recall accuracy to depend on the quality of that transformation. MemPalace inverts this: verbatim storage is the invariant, and the retrieval pipeline (hybrid BM25 + vector + closet-pointer ranking) is optimized to find the right original text. This design choice, combined with the spatial hierarchy enabling scoped searches rather than flat-corpus queries, produces the benchmark-leading 96.6% R@5 on LongMemEval with zero API calls. The pluggable backend system formalized via RFC 001 — with a published conformance test suite that any third-party backend must pass — is also distinctive: it makes the storage substrate genuinely swappable without coupling the entire system to ChromaDB’s API surface.
Licensing Model MIT licensed — all features available in self-hosted deployments with no restrictions or license keys required.
No Code Platforms · AI Development · Developer Tools
Visual LLM workflow platform with RAG pipelines, agent capabilities, and model management for building production AI applications.
AI Code Assistants · AI Development
Orchestrate an army of AI coding agents—Claude Code, Codex, Gemini CLI, and more—running simultaneously in isolated git worktrees from a single Electron desktop app.
AI Code Assistants · AI Development
The self-hosted developer control center for running AI coding agents — locally, in Docker, on VMs, or across cloud backends — with automation workflows for GitHub, Slack, and more.