mnemo

Name: mnemo
Rating: 5 (226 reviews)

A local-first AI memory sidecar that extracts entities, builds a persistent knowledge graph, and injects ranked context into any LLM pipeline — no cloud required.

226stars

9forks

MIT License

Rust

View Source

On This Page

mnemo is a local-first memory layer for developers building custom LLM pipelines. It runs as a standalone sidecar service: you POST raw text to it, it extracts named entities and relationships using a configurable LLM backend, persists everything to SQLite, and on demand returns a scored, graph-expanded context string you can inject directly into your next prompt. The entire round-trip takes under 50ms.

At its core, mnemo solves the stateless problem of LLM conversations. Each session normally starts fresh, with no awareness of previous interactions, known facts, or established relationships. mnemo watches every conversation you feed it, deduplicates entities across sessions, weights relationships by how often they co-occur, and traverses the knowledge graph at retrieval time to surface inferred connections — not just direct matches.

The project is written entirely in Rust across four crates: a core library handling all business logic (entity extraction, graph operations, retrieval engine, DB layer), a thin Axum REST API, a CLI tool for interactive use, and a benchmarking harness. A Python SDK wraps the REST API for teams using Python-based agent frameworks. A pre-built Docker image with an Ollama sidecar enables fully offline operation with no external dependencies.

mnemo is designed for developers who want full control over their LLM memory layer — where data lives, what the schema looks like, and how retrieval is scored — without being locked into a managed service or a Python runtime.

What You Get

A REST API with endpoints for ingest, retrieve, entity management, graph neighbor traversal, full-text search, and health/stats
A persistent SQLite knowledge graph powered by petgraph with BFS traversal up to configurable depth for multi-hop relationship discovery
Entity deduplication across sessions by name and type, with alias merging and relationship weight accumulation on repeated sightings
A 6-stage retrieval pipeline combining full-text chunk search, entity name matching, graph expansion, relation filtering, confidence scoring, and context assembly
A single static Rust binary that runs from scratch Docker image with zero runtime dependencies, or via cargo install
A Python SDK (both sync and async clients) for teams building agent frameworks in Python
A CLI tool for interactive memory management — ingest, search, entity listing, graph neighbor inspection, and wipe
A Docker Compose setup with an Ollama sidecar for fully offline, free operation

Common Use Cases

LLM pipeline memory — persisting conversation history across sessions for chatbots, coding assistants, or research agents that need to remember prior interactions
Personal knowledge sidecar — storing notes, documents, and facts and querying them semantically when building prompts for personal AI assistants
Agent memory backend — providing structured, retrievable memory for multi-step AI agents that must track entities, tools, and relationships over time
Local offline LLM applications — building fully air-gapped LLM systems using Ollama and mnemo where no data ever leaves the user’s machine
Developer tooling memory — giving LLM-powered dev tools (code review, documentation assistants) context about a project’s entities, relationships, and history

Under The Hood

Architecture mnemo uses a clean layered architecture with strict one-way dependency flow: a core library crate encapsulates all business logic — entity extraction, graph operations, retrieval scoring, and the SQLite persistence layer — while thin binary crates for the API, CLI, and benchmarks depend on core but not on each other. Shared state is managed through Arc-wrapped components (Database, KnowledgeGraph behind a Tokio RwLock, RetrievalEngine, Extractor) injected into Axum handlers via a cloned AppState struct. The knowledge graph is held entirely in memory and synchronized with SQLite on every write, with the RwLock ensuring concurrent read access during retrieval while serializing writes. The separation of concerns is well-executed: persistence, LLM provider abstraction, graph traversal, and retrieval pipeline are each isolated modules with clear boundaries and no circular dependencies.

Tech Stack mnemo is written in Rust 1.78 using the Tokio async runtime throughout. Axum 0.7 provides the REST API layer with Tower middleware for CORS and structured request tracing. SQLite is the sole persistence layer, accessed via sqlx 0.7 with the runtime-tokio driver and WAL mode for concurrent reads; all schema migrations are embedded in the binary. petgraph 0.6 powers the in-memory directed graph (DiGraph with typed node and edge weights). reqwest 0.12 handles outbound HTTP to LLM provider endpoints. The CLI uses Clap 4.5 with derive macros. The binary is compiled against musl libc in a multi-stage Dockerfile, producing a static executable that runs from a scratch container image. A Python SDK wraps the REST API with both synchronous and async clients.

Code Quality The project ships with comprehensive test coverage: 122 Rust tests spanning data model round-trips, API endpoint behavior, and graph operations, plus 21 Python SDK tests and 12 performance benchmark suites. Async tests use tokio-test and wiremock mocks the LLM HTTP calls for deterministic integration testing. Error handling is explicit throughout — thiserror defines a typed MnemoError enum that maps cleanly to HTTP status codes at the API boundary, and extraction failures use a graceful fallback (returning an empty ExtractionResult rather than surfacing errors to callers). Naming conventions are consistent and idiomatic Rust throughout. CI is active with a GitHub Actions workflow covering build and test passes.

What Makes It Unique The real differentiator is combining a persistent in-memory petgraph with BFS multi-hop traversal applied at retrieval time, alongside a score penalty system (0.5× weight) for graph-inferred versus directly matched entities. Most AI memory tools store chunks in a vector database and do nearest-neighbor lookup on embeddings; mnemo instead builds a structured entity-relationship layer that surfaces inferred connections between entities never directly co-mentioned in the same chunk. The 6-stage retrieval pipeline — full-text search, entity name matching, graph BFS expansion, relation filtering, confidence scoring, context assembly — is meaningfully more sophisticated than naive context injection. Combined with the zero-dependency static binary enabling fully offline operation with Ollama, mnemo occupies a genuine niche for developers who want structured, traversable, local memory they control entirely.

Self-Hosting

mnemo is released under the MIT License, one of the most permissive open-source licenses available. You may use it commercially, modify it freely, distribute modified versions, and embed it in proprietary products without any copyleft obligations. The only requirement is retaining the copyright notice. There are no open-core restrictions, no license checks, no enterprise tiers, and no feature flags gating capabilities behind a paid plan.

Running mnemo yourself means operating a Rust API server alongside a SQLite database file. The operational footprint is small: the server is a single static binary with no external runtime dependencies, and SQLite requires no separate database process. You are responsible for the machine’s uptime, backing up the SQLite file, and upgrading the binary when new releases are published. The knowledge graph is held in memory on startup (loaded from SQLite), so restart time scales with graph size but is typically fast. There is no built-in high-availability, clustering, or backup tooling — these are left to the operator.

There is currently no hosted or managed version of mnemo, so there is no cloud tier to compare against. What this means practically is that you own all your data and all operational responsibilities with no fallback support tier. The project is early-stage (created June 2025, two months of commits) with no formal releases on GitHub yet, so you should expect API changes and limited long-term stability guarantees. Community support is available via GitHub issues, but there are no SLAs, no paid support contracts, and no enterprise agreements.

Related Apps

Rust

95%

MIT

claw-code

AI Agents · AI Code Assistants

194,567

A Rust-built CLI agent harness for Claude AI with persistent sessions, MCP tool integration, plugin hooks, and multi-provider support — designed to run autonomous coding workflows without human babysitting.

View details