Glean is a self-hosted platform designed for users overwhelmed by information fragmentation, combining an RSS reader with personal knowledge management capabilities. It aggregates content from RSS/Atom feeds, allows saving articles and external URLs, and uses vector embeddings to learn reading preferences for smart recommendations. Built with FastAPI, React, and PostgreSQL, it supports Docker-based deployment with optional Milvus for AI features.
The system uses a modern tech stack including TypeScript, Vite, Tailwind CSS, and Redis for task queuing, with background workers that auto-update feeds every 15 minutes. Users can deploy a full version with Milvus for AI recommendations or a lite version without it, ensuring flexibility for different resource constraints and use cases.
What You Get
- RSS Subscription - Subscribe to RSS 2.0, RSS 1.0, and Atom feeds with OPML import/export for seamless migration between readers
- Smart Reading - Clean, distraction-free reading interface with content filtering and time-line view for chronological browsing
- Read Later - Save articles to a read-later queue with automatic cleanup of old items to reduce clutter
- Bookmarks - Save articles from feeds or any external URL with folder and tag-based organization for personal knowledge curation
- Background Sync - Automatic feed updates every 15 minutes using a background worker powered by arq and Redis task queue
- Self-hosted Deployment - Full data ownership via Docker containers with options for full (Milvus) or lite (no Milvus) deployments
- Admin Dashboard - Dedicated interface for user management, system monitoring, and configuration via port 3001
- AI-Powered Recommendations - Smart content scoring and tiered recommendations (Recommended/Normal/Not Interested) using vector embeddings and preference learning (Phase 3 WIP)
- Chrome Extension (Planned) - One-click bookmarking from browser (planned feature, not yet implemented)
Common Use Cases
- Managing a research workflow - A graduate student uses Glean to subscribe to academic journals, save relevant papers, and tag them by topic for later review
- Running a content curation newsletter - A journalist aggregates industry news from 50+ RSS feeds, saves key articles, and uses smart recommendations to prioritize high-relevance content
- Organizing personal knowledge - A developer saves technical blog posts and tutorials from multiple sources, organizes them with folders and tags, and retrieves them later for reference
- Reducing information overload - A product manager unsubscribes from email newsletters and uses Glean to consolidate all RSS sources into one intelligent feed with preference-based sorting
Under The Hood
Architecture
- Clear monorepo structure with distinct backend and frontend directories, enforcing separation of concerns between API, worker, and UI layers
- Service-layer design with dependency injection and async database clients isolating data access from business logic
- Event-driven processing via Redis and Arq, decoupling feed ingestion and embedding generation from HTTP request cycles
- Microservice-like deployment using Docker Compose with explicit boundaries for PostgreSQL, Redis, Milvus, and MinIO
- RESTful API routes and React components follow consistent conventions with stateless JWT authentication and centralized request tracing
- Modular build system using Makefiles and concurrent task runners enabling independent development of services
Tech Stack
- Python backend powered by FastAPI with asyncpg for PostgreSQL, Alembic for migrations, and Pydantic for data validation
- Milvus as the vector database for semantic search, integrated with MinIO for storage and etcd for coordination
- Redis for caching and task queuing, with background workers managed via Uvicorn
- Docker Compose with environment-aware configurations and health checks for all services
- React frontend managed via pnpm with concurrent development workflows
- Comprehensive tooling including Ruff, Pyright, ESLint, Prettier, and pre-commit hooks for code quality
Code Quality
- Extensive test suite covering unit, integration, and end-to-end scenarios with robust mocking of external dependencies
- Strong type safety and clear separation between frontend hooks, stores, and API clients with consistent naming conventions
- Robust error handling with custom exceptions and graceful frontend degradation, including fallbacks for corrupted state
- Comprehensive authentication testing covering JWT validation, OAuth integration, and token lifecycle management
- Consistent use of async/await and dependency injection in backend services with test fixtures simulating edge cases
- Automated testing infrastructure with environment-aware configurations and conditional test skipping for reliability
What Makes It Unique
- Modular monorepo with independently versioned packages enabling isolated development and cross-service reuse
- Native RSS ingestion with built-in semantic tagging and automated categorization, eliminating external dependencies
- Context-aware tag system that dynamically computes usage counts in real-time without caching or background jobs
- Custom UI component library that treats accessibility and visual hierarchy as first-class design principles
- Unified authentication and user context propagation across frontend and backend via shared type definitions
- Extensible MCP-based plugin architecture for feed parsers and data processors without modifying core code