OpenKB

Name: OpenKB
Rating: 5 (2868 reviews)

An open-source CLI that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs — powered by vectorless, reasoning-based retrieval (PageIndex) instead of a vector database.

2.9Kstars

317forks

Apache License 2.0

Python

View Source Visit Website

On This Page

OpenKB implements an idea Andrej Karpathy described publicly: rather than chunking documents into a vector store for similarity search, an LLM reads long documents and generates summaries, concept pages, and entity pages, producing a genuine wiki-style knowledge base with real hyperlinks between related concepts. Retrieval happens through reasoning over document structure (via the companion PageIndex project) instead of nearest-neighbor vector search.

Output follows Google’s Open Knowledge Format (OKF) specification for knowledge sharing, and automatically extracts entity pages for people, organizations, places, and products, keeping them in sync as source documents change. The result scales to long documents that vector-chunking approaches often fragment awkwardly, and natively handles multi-modal content rather than treating it as a special case.

Apache-2.0 licensed and built by Vectify AI (also behind PageIndex), OpenKB has grown rapidly (Trendshift-featured) since launch, reflecting interest in retrieval approaches that don’t require standing up and maintaining a vector database.

What You Get

Automatic conversion of raw documents into a wiki-style knowledge base with real interlinking between concepts
Vectorless, reasoning-based retrieval via PageIndex instead of nearest-neighbor vector search
Auto-extracted entity pages for people, organizations, places, and products, kept in sync with source documents
Output following Google’s Open Knowledge Format (OKF) specification for interoperable knowledge sharing

Common Use Cases

Building a browsable, wiki-style knowledge base from a large document collection without maintaining a vector database
Retrieving information from very long documents where vector-chunking approaches tend to lose context
Automatically generating and maintaining entity reference pages (people, orgs, products) from a document corpus
Producing OKF-compliant knowledge base output for interoperability with other knowledge-sharing tools

Under The Hood

Architecture OpenKB’s core departure from typical RAG systems is retrieval via reasoning over document structure rather than vector similarity search — it uses the companion PageIndex project’s “vectorless” retrieval approach, meaning there’s no embedding index to build or maintain, and long documents don’t need to be chunked in ways that can fragment meaning. The wiki output structure (summaries, concept pages, entity pages, cross-links) follows Google’s Open Knowledge Format, giving the generated knowledge base a standardized, interoperable shape rather than a bespoke format.

Tech Stack Python, distributed as a CLI, built on top of PageIndex (a separate VectifyAI project) for the underlying reasoning-based retrieval mechanism, with output conforming to Google’s Open Knowledge Format specification.

Code Quality Very active, consistently maintained commit history and Trendshift-featured growth reflect meaningful early adoption; being built on a named, documented retrieval technique (PageIndex) rather than an ad hoc approach lends it more conceptual grounding than typical RAG wrapper projects.

What Makes It Unique Most knowledge-base and RAG tools default to vector embeddings and similarity search; OpenKB specifically implements a vectorless, reasoning-based retrieval approach inspired by a concept Andrej Karpathy publicly described, producing an actual browsable wiki rather than an opaque vector index — a structurally different bet on how LLMs should retrieve from long documents.

Self-Hosting

Licensing Model Apache-2.0 licensed — fully open source with no license key.

Self-Hosting Restrictions None found; the CLI runs locally against your own documents and LLM provider credentials.

License Key Required No.

On This Page

Repository Health

Pre-computed score based on development activity, maintenance, community, maturity, and trend momentum.

77/100Good

Development Activity96

Maintenance100

Community56

Maturity16

Momentum40

Very active developmentWell-maintained with consistent updatesRapidly growing projectNew project

Technical Analysis

72/100Good

Architecture78

Code Quality70

Innovation82

Learning Curve58

Repository Stats

Contributors

Total Commits

162

Monthly Commits

Watchers

Repo Age

3 months

Last Commit

3 days ago

Built With

Python96.0%

Recent Releases

11 total

~3.7 releases/month

Topics

agents ai knowledge-base llm rag retrieval

Related Apps

Clojure

70%

AGPL 3.0

Logseq

Note Taking · Knowledge Management

43,684

A privacy-first, open-source knowledge graph platform combining Markdown, Org-mode, bidirectional linking, and local-first storage for building your second brain.

View details