OpenKB

An open-source CLI that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs — powered by vectorless, reasoning-based retrieval (PageIndex) instead of a vector database.

2.9Kstars
317forks
Apache License 2.0
Python

OpenKB implements an idea Andrej Karpathy described publicly: rather than chunking documents into a vector store for similarity search, an LLM reads long documents and generates summaries, concept pages, and entity pages, producing a genuine wiki-style knowledge base with real hyperlinks between related concepts. Retrieval happens through reasoning over document structure (via the companion PageIndex project) instead of nearest-neighbor vector search.

Output follows Google’s Open Knowledge Format (OKF) specification for knowledge sharing, and automatically extracts entity pages for people, organizations, places, and products, keeping them in sync as source documents change. The result scales to long documents that vector-chunking approaches often fragment awkwardly, and natively handles multi-modal content rather than treating it as a special case.

Apache-2.0 licensed and built by Vectify AI (also behind PageIndex), OpenKB has grown rapidly (Trendshift-featured) since launch, reflecting interest in retrieval approaches that don’t require standing up and maintaining a vector database.

What You Get

  • Automatic conversion of raw documents into a wiki-style knowledge base with real interlinking between concepts
  • Vectorless, reasoning-based retrieval via PageIndex instead of nearest-neighbor vector search
  • Auto-extracted entity pages for people, organizations, places, and products, kept in sync with source documents
  • Output following Google’s Open Knowledge Format (OKF) specification for interoperable knowledge sharing

Common Use Cases

  • Building a browsable, wiki-style knowledge base from a large document collection without maintaining a vector database
  • Retrieving information from very long documents where vector-chunking approaches tend to lose context
  • Automatically generating and maintaining entity reference pages (people, orgs, products) from a document corpus
  • Producing OKF-compliant knowledge base output for interoperability with other knowledge-sharing tools

Under The Hood

Architecture OpenKB’s core departure from typical RAG systems is retrieval via reasoning over document structure rather than vector similarity search — it uses the companion PageIndex project’s “vectorless” retrieval approach, meaning there’s no embedding index to build or maintain, and long documents don’t need to be chunked in ways that can fragment meaning. The wiki output structure (summaries, concept pages, entity pages, cross-links) follows Google’s Open Knowledge Format, giving the generated knowledge base a standardized, interoperable shape rather than a bespoke format.

Tech Stack Python, distributed as a CLI, built on top of PageIndex (a separate VectifyAI project) for the underlying reasoning-based retrieval mechanism, with output conforming to Google’s Open Knowledge Format specification.

Code Quality Very active, consistently maintained commit history and Trendshift-featured growth reflect meaningful early adoption; being built on a named, documented retrieval technique (PageIndex) rather than an ad hoc approach lends it more conceptual grounding than typical RAG wrapper projects.

What Makes It Unique Most knowledge-base and RAG tools default to vector embeddings and similarity search; OpenKB specifically implements a vectorless, reasoning-based retrieval approach inspired by a concept Andrej Karpathy publicly described, producing an actual browsable wiki rather than an opaque vector index — a structurally different bet on how LLMs should retrieve from long documents.

Self-Hosting

Licensing Model Apache-2.0 licensed — fully open source with no license key.

Self-Hosting Restrictions None found; the CLI runs locally against your own documents and LLM provider credentials.

License Key Required No.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search