Karakeep is a self-hosted application designed for data hoarders who want to save, organize, and retrieve everything they encounter online—links, notes, images, PDFs, and videos—without relying on third-party services. It solves the problem of fragmented digital clutter by combining traditional bookmarking with AI-driven metadata extraction and full-content archiving. Built for developers and power users who value privacy and control, it integrates with browser extensions, mobile apps, and RSS feeds to create a unified knowledge base.
Technically, Karakeep is built with Next.js for the web interface, Drizzle for database management, Meilisearch for full-text search, and Puppeteer for content crawling. It supports AI tagging via OpenAI or local Ollama models, uses tRPC for API communication, and includes monolith for full-page archiving and yt-dlp for video capture. Deployment is Docker-first, with SSO, REST API, and multi-language support for enterprise and personal use.
What You Get
- AI-based automatic tagging and summarization - Uses OpenAI or local Ollama models to generate tags and summaries for saved links, notes, and archived content, reducing manual categorization.
- Full text search across all content - Powered by Meilisearch to instantly find text within bookmarks, notes, archived pages, and extracted image OCR text.
- Full page archival with monolith - Saves complete HTML snapshots of web pages to preserve content against link rot, including images and styles.
- OCR for text extraction from images - Extracts readable text from saved images using optical character recognition, making visual content searchable.
- Auto video archiving with yt-dlp - Automatically downloads and archives videos from YouTube and other platforms when bookmarked.
- Browser extensions for Chrome and Firefox - One-click bookmarking directly from any webpage with automatic metadata fetching and image previews.
- Bookmark importers from Pocket, Chrome, Omnivore, and Linkwarden - Migrates existing collections without manual re-entry, preserving structure and metadata.
- Collaborative bookmark lists - Multiple users can share and edit the same bookmark lists with real-time updates and permissions.
- Full-page screenshot and highlight storage - Captures and stores visual snapshots of pages along with user highlights for later reference.
- SSO support (OAuth, SAML) - Enables enterprise-grade authentication via Google, GitHub, or other identity providers.
- RSS feed auto-hoarding - Automatically saves articles from subscribed RSS feeds into your library with metadata and archiving.
- Dark mode and multi-language support - Built-in theming and translations via Weblate for global accessibility.
Common Use Cases
- Managing a research library - A graduate student saves academic papers, PDFs, and web articles from journals, using AI tagging to auto-categorize by topic and full-text search to find quotes across 500+ saved items.
- Running a personal knowledge base - A software engineer archives blog posts, GitHub repos, and tutorials, using full-page archiving to ensure references remain accessible even if original links die.
- Curating content for a newsletter - A tech blogger collects links from Twitter, Hacker News, and RSS feeds, using AI summaries to quickly draft newsletter blurbs without re-reading each article.
- Team knowledge sharing in a remote company - A product team shares product research, competitor analysis, and design resources in shared Karakeep lists with SSO and collaborative tagging.
Under The Hood
Architecture
- Monorepo structure organized into distinct workspaces for applications, shared packages, and tooling, enabling independent development and efficient caching
- Domain-driven modular design with clear boundaries between UI, API, and data layers via dedicated packages for types, tRPC endpoints, and database schemas
- Dependency injection through workspace-relative package links and centralized build configurations, ensuring consistency across services
- Clean separation of concerns with isolated modules for UI components, API clients, database schemas, and infrastructure tooling
Tech Stack
- Full-stack TypeScript with React 19, Next.js 14, and Expo for unified web and mobile development
- tRPC for end-to-end type-safe API communication integrated with TanStack Query for state management
- Prisma as the ORM with migration and studio tooling for robust database schema management
- Tailwind CSS with class-variance-authority and tailwind-merge for component-based, utility-first styling
- Oxlint and Oxfmt as modern Rust-based replacements for ESLint and Prettier
Code Quality
- Extensive test coverage spanning unit, integration, and end-to-end scenarios with Vitest and realistic test fixtures
- Strong type safety enforced through Zod schemas and comprehensive type definitions across shared packages
- Consistent naming conventions and well-structured test environments with containerized dependencies for reproducibility
- Robust error handling with descriptive assertions, though custom error classes are not utilized
What Makes It Unique
- Native integration between browser extension and web app enables seamless, context-aware bookmark synchronization
- Hierarchical list paths with icon-based navigation create an intuitive, tree-like knowledge organization system
- Unified API endpoints for summarization and asset attachment transform bookmarks into rich, annotated knowledge nodes
- Smart, role-aware list filtering with subtree exclusion enables dynamic views without external authorization services
- Consistent theming and design language extended across UI, Markdown editor, and documentation for a unified developer and user experience