OpenViking

Name: OpenViking
Rating: 5 (26320 reviews)

An open-source context database that gives AI agents a unified filesystem for memory, resources, and skills with hierarchical tiered retrieval.

26.3Kstars

2Kforks

GNU AGPLv3

Python

View Source Visit Website

On This Page

OpenViking is a purpose-built context database for AI agents, developed by ByteDance’s Volcengine team. It replaces the fragmented approach of storing memories in code, resources in vector databases, and skills in scattered files with a single unified filesystem paradigm. By organizing context as a hierarchy of directories and files with URI-addressable nodes, agents can manage their knowledge base the same way a developer manages a local filesystem.

The core innovation is OpenViking’s three-tier context loading system (L0/L1/L2), where abstract summaries, overviews, and full content are stored at different levels and loaded on demand. This dramatically reduces token consumption compared to naive RAG approaches that retrieve full documents indiscriminately. Retrieval traverses the directory tree recursively, combining directory-level positioning with dense and sparse semantic vector search and reranking for high-precision context acquisition.

OpenViking ships with automatic session management that compresses conversation history, extracts long-term memories from agent interactions, and maintains a hotness-weighted scoring system so frequently accessed context surfaces faster over time. The system integrates natively with MCP (Model Context Protocol), LangChain, and LangGraph, and supports a wide range of VLM and embedding providers including OpenAI, Volcengine Doubao, Kimi, GLM, and local Ollama models.

Beyond core context storage, OpenViking includes a Rust-based CLI tool and companion filesystem layer (RAGFS) compiled as a Python extension, a FastAPI-based REST server with OAuth and API key authentication, OpenTelemetry tracing, a visual web studio for browsing the context tree, and a vikingbot subsystem that connects agents to messaging platforms like Feishu, Telegram, Slack, DingTalk, and WeChat.

What You Get

Filesystem paradigm for context — Organize memories, resources, and skills as directories and files under a unified viking:// URI namespace, enabling intuitive path-based addressing
L0/L1/L2 tiered context loading — Three-level structure loads abstract summaries, overviews, or full content on demand, cutting token costs compared to full-document RAG
Hierarchical recursive retrieval — Directory-scoped semantic search with dense + sparse hybrid vectors and reranking traverses the tree recursively for high-precision context acquisition
Automatic session compression — Session manager compresses conversation history, externalizes large tool outputs, and extracts long-term memories so agents become smarter over time
Visualized retrieval trajectory — Web Studio shows the exact directory path the retriever traversed, making it possible to debug why an agent got the wrong context
Multi-provider VLM and embedding support — Works with OpenAI, Volcengine Doubao, Kimi, GLM, Gemini, and local Ollama models via a unified litellm dispatch layer
Native MCP integration — Exposes context read/write operations as MCP tools, letting any MCP-compatible agent framework connect without custom adapters
Multiple vector backend adapters — Pluggable storage supporting VikingDB, Qdrant, local disk, OpenGauss, and Volcengine-hosted backends

Common Use Cases

Coding agent memory — Persist project-specific knowledge, file summaries, and past decisions so an AI coding assistant retains context across sessions without re-reading entire codebases
Multi-step agentic RAG — Store and retrieve domain knowledge with directory-scoped queries, enabling agents to navigate large knowledge bases like a filesystem rather than flat vector search
Long-running task agents — Use automatic session compression to prevent context windows from overflowing when agents execute multi-hour workflows that generate thousands of tool call results
Team knowledge bases — Build shared resource libraries where multiple agents read common skills and documentation from a centralized context database with namespace isolation per user
Debugging retrieval pipelines — Use the visualized retrieval trajectory to pinpoint exactly which directory branch an agent searched and why certain memories were or were not surfaced
Bot integrations — Connect the vikingbot layer to Feishu, Telegram, Slack, DingTalk, or WeChat so end-users interact with context-aware agents through familiar messaging apps

Under The Hood

Architecture OpenViking is organized as a layered, modular system built around a virtual filesystem abstraction. The central VikingFS layer translates viking:// URIs into storage operations, routing reads and writes through an async AGFS (Agent Filesystem) client compiled from Rust into a Python extension. Above VikingFS sits the retrieval layer, which implements hierarchical traversal using a HierarchicalRetriever that fans out vector searches across directories, convergently narrows candidates through multiple rounds, and applies rerank scoring with hotness weighting. The session layer wraps both: it intercepts incoming messages, externalizes tool outputs, triggers background compression jobs via APScheduler, and emits extracted memories back into the filesystem. This separation of concerns means context storage, retrieval strategy, and session lifecycle are independently composable — an operator can swap vector backends or tune retrieval parameters without touching session logic.

Tech Stack The project is a Python 3.10+ package built with setuptools and maturin (for Rust extensions), using FastAPI and Uvicorn as the REST server framework and Pydantic v2 for all data models. Vector storage is abstracted behind pluggable adapter classes supporting VikingDB, Qdrant, local disk, and OpenGauss backends. Embeddings and VLM calls are dispatched through litellm, enabling transparent support for OpenAI, Volcengine Doubao, Kimi, GLM, Gemini, and Ollama. The CLI is a Rust binary wrapped in a thin Python entry point. OpenTelemetry (with OTLP gRPC and HTTP exporters) provides distributed tracing throughout. Tree-sitter parsers for ten languages handle code-aware chunking, and the web studio frontend is bundled as static assets served by the FastAPI app.

Code Quality The codebase demonstrates strong engineering discipline with a comprehensive test suite organized by subsystem under a top-level tests/ directory, covering retrievers, session compression, storage adapters, observability, and CLI behavior. Tests use pytest with pytest-asyncio for async coverage and pytest-xdist for parallel execution. Ruff enforces code style with isort, pyflakes, and bugbear rules; mypy type-checking is configured but set to permissive mode, meaning type annotations are present but not exhaustively enforced. Loguru provides structured logging throughout. Error handling is explicit, with typed exception classes (NotFoundError, PermissionDeniedError, FailedPreconditionError) mapped to HTTP status codes at the server boundary rather than swallowed silently.

What Makes It Unique OpenViking’s distinguishing contribution is applying the filesystem metaphor to agent context rather than treating it as a flat bag of vectors. The L0/L1/L2 tiered abstraction — where each context node maintains an automatically generated abstract and overview at shorter lengths alongside its full content — allows a retriever to decide how much detail to load based on relevance, rather than always fetching everything. The hierarchical recursive retriever with directory dominance scoring (a directory’s relevance must exceed a ratio of its best child’s score to be promoted) is a novel approach to multi-granularity retrieval not found in standard RAG frameworks. Combined with hotness-weighted memory lifecycle management that boosts recently and frequently accessed contexts, the system is designed to self-improve as agents use it rather than remaining static.

Self-Hosting

OpenViking is released under the GNU Affero General Public License v3.0 (AGPL-3.0). This means you can freely use, modify, and distribute the software, including for commercial purposes, but any modifications you deploy in a networked service must also be released under AGPL-3.0. For organizations that need to keep their modifications proprietary — common when embedding OpenViking into a commercial product or SaaS platform — this license requires either open-sourcing those changes or negotiating a separate commercial license with ByteDance/Volcengine. Pure internal use (not exposed as a public service) carries fewer obligations, but legal counsel is advisable for commercial deployments.

Running OpenViking yourself requires Python 3.10+, a Rust toolchain for building the RAGFS and CLI components from source, and a C++ compiler (GCC 9+ or Clang 11+). In production you will also need a vector database backend (VikingDB, Qdrant, or OpenGauss), embedding model endpoints, and optionally a VLM provider for document parsing and image understanding. The project ships a Docker Compose file and Kubernetes Helm chart examples in the repository, but operating the full stack — managing database uptime, coordinating background compression jobs, handling model API rate limits, and keeping the Rust extension compatible across Python versions — represents a meaningful operational investment for small teams.

Volcengine, the managed cloud offering from ByteDance behind OpenViking, provides a hosted version with enterprise SLAs, managed upgrades, and native integration with Doubao models and VikingDB. Self-hosters forgo this managed upgrade path, SLA guarantees, cloud-native autoscaling, and first-party support channels. The hosted tier also adds features like cross-tenant isolation enforcement and enterprise SSO that would require additional engineering effort to replicate on a self-managed deployment. For teams already in the Volcengine ecosystem, the managed path will be significantly lower operational overhead than self-hosting.

On This Page