A local-first AI memory sidecar that extracts entities, builds a persistent knowledge graph, and injects ranked context into any LLM pipeline — no cloud required.
mnemo is a local-first memory layer for developers building custom LLM pipelines. It runs as a standalone sidecar service: you POST raw text to it, it extracts named entities and relationships using a configurable LLM backend, persists everything to SQLite, and on demand returns a scored, graph-expanded context string you can inject directly into your next prompt. The entire round-trip takes under 50ms.
At its core, mnemo solves the stateless problem of LLM conversations. Each session normally starts fresh, with no awareness of previous interactions, known facts, or established relationships. mnemo watches every conversation you feed it, deduplicates entities across sessions, weights relationships by how often they co-occur, and traverses the knowledge graph at retrieval time to surface inferred connections — not just direct matches.
The project is written entirely in Rust across four crates: a core library handling all business logic (entity extraction, graph operations, retrieval engine, DB layer), a thin Axum REST API, a CLI tool for interactive use, and a benchmarking harness. A Python SDK wraps the REST API for teams using Python-based agent frameworks. A pre-built Docker image with an Ollama sidecar enables fully offline operation with no external dependencies.
mnemo is designed for developers who want full control over their LLM memory layer — where data lives, what the schema looks like, and how retrieval is scored — without being locked into a managed service or a Python runtime.
Architecture mnemo uses a clean layered architecture with strict one-way dependency flow: a core library crate encapsulates all business logic — entity extraction, graph operations, retrieval scoring, and the SQLite persistence layer — while thin binary crates for the API, CLI, and benchmarks depend on core but not on each other. Shared state is managed through Arc-wrapped components (Database, KnowledgeGraph behind a Tokio RwLock, RetrievalEngine, Extractor) injected into Axum handlers via a cloned AppState struct. The knowledge graph is held entirely in memory and synchronized with SQLite on every write, with the RwLock ensuring concurrent read access during retrieval while serializing writes. The separation of concerns is well-executed: persistence, LLM provider abstraction, graph traversal, and retrieval pipeline are each isolated modules with clear boundaries and no circular dependencies.
Tech Stack mnemo is written in Rust 1.78 using the Tokio async runtime throughout. Axum 0.7 provides the REST API layer with Tower middleware for CORS and structured request tracing. SQLite is the sole persistence layer, accessed via sqlx 0.7 with the runtime-tokio driver and WAL mode for concurrent reads; all schema migrations are embedded in the binary. petgraph 0.6 powers the in-memory directed graph (DiGraph with typed node and edge weights). reqwest 0.12 handles outbound HTTP to LLM provider endpoints. The CLI uses Clap 4.5 with derive macros. The binary is compiled against musl libc in a multi-stage Dockerfile, producing a static executable that runs from a scratch container image. A Python SDK wraps the REST API with both synchronous and async clients.
Code Quality The project ships with comprehensive test coverage: 122 Rust tests spanning data model round-trips, API endpoint behavior, and graph operations, plus 21 Python SDK tests and 12 performance benchmark suites. Async tests use tokio-test and wiremock mocks the LLM HTTP calls for deterministic integration testing. Error handling is explicit throughout — thiserror defines a typed MnemoError enum that maps cleanly to HTTP status codes at the API boundary, and extraction failures use a graceful fallback (returning an empty ExtractionResult rather than surfacing errors to callers). Naming conventions are consistent and idiomatic Rust throughout. CI is active with a GitHub Actions workflow covering build and test passes.
What Makes It Unique The real differentiator is combining a persistent in-memory petgraph with BFS multi-hop traversal applied at retrieval time, alongside a score penalty system (0.5× weight) for graph-inferred versus directly matched entities. Most AI memory tools store chunks in a vector database and do nearest-neighbor lookup on embeddings; mnemo instead builds a structured entity-relationship layer that surfaces inferred connections between entities never directly co-mentioned in the same chunk. The 6-stage retrieval pipeline — full-text search, entity name matching, graph BFS expansion, relation filtering, confidence scoring, context assembly — is meaningfully more sophisticated than naive context injection. Combined with the zero-dependency static binary enabling fully offline operation with Ollama, mnemo occupies a genuine niche for developers who want structured, traversable, local memory they control entirely.
mnemo is released under the MIT License, one of the most permissive open-source licenses available. You may use it commercially, modify it freely, distribute modified versions, and embed it in proprietary products without any copyleft obligations. The only requirement is retaining the copyright notice. There are no open-core restrictions, no license checks, no enterprise tiers, and no feature flags gating capabilities behind a paid plan.
Running mnemo yourself means operating a Rust API server alongside a SQLite database file. The operational footprint is small: the server is a single static binary with no external runtime dependencies, and SQLite requires no separate database process. You are responsible for the machine’s uptime, backing up the SQLite file, and upgrading the binary when new releases are published. The knowledge graph is held in memory on startup (loaded from SQLite), so restart time scales with graph size but is typically fast. There is no built-in high-availability, clustering, or backup tooling — these are left to the operator.
There is currently no hosted or managed version of mnemo, so there is no cloud tier to compare against. What this means practically is that you own all your data and all operational responsibilities with no fallback support tier. The project is early-stage (created June 2025, two months of commits) with no formal releases on GitHub yet, so you should expect API changes and limited long-term stability guarantees. Community support is available via GitHub issues, but there are no SLAs, no paid support contracts, and no enterprise agreements.
No Code Platforms · AI Development · Developer Tools
Visual LLM workflow platform with RAG pipelines, agent capabilities, and model management for building production AI applications.
AI Code Assistants · AI Development
Orchestrate an army of AI coding agents—Claude Code, Codex, Gemini CLI, and more—running simultaneously in isolated git worktrees from a single Electron desktop app.
AI Code Assistants · AI Development
The self-hosted developer control center for running AI coding agents — locally, in Docker, on VMs, or across cloud backends — with automation workflows for GitHub, Slack, and more.