auto-news

Name: auto-news
Rating: 5 (893 reviews)

An AI-powered personal news aggregator that filters multi-source feeds through LLMs and delivers curated, noise-free summaries to your Notion workspace.

893stars

111forks

MIT License

Python

View Source

On This Page

Auto-News is an open source personal content aggregator designed to combat information overload in the AI era. It pulls from Twitter, RSS feeds, YouTube, Reddit, web articles, and personal journal notes, then routes each item through a configurable LLM backend—OpenAI ChatGPT, Google Gemini, or locally-run Ollama models via LangChain—to rank, categorize, and summarize content based on your personal interests.

The pipeline runs on Apache Airflow, making it fully schedulable and observable. Each source type has its own dedicated operator that handles pulling, deduplication against Redis, LLM summarization, and publishing results into Notion databases. The result is a unified RSS-reader-style inbox where noisy content is filtered out before it reaches you, with only the highest-ranked insights surfacing for human review.

AutoNews also includes a Weekly Top-k Recap feature, automatically generating periodical digests, as well as an experimental multi-agent Deepdive mode powered by AutoGen that lets you explore any topic across the web through an autonomous search agent. A hosted managed version (Dots Agent) exists as a commercial offering available on iOS, Android, and the web for those who prefer not to self-host.

What You Get

Multi-source ingestion from RSS, Twitter/X, Reddit, YouTube, web articles, and personal journal notes in a single unified pipeline
LLM-powered noise filtering and ranking that removes 80%+ of irrelevant content based on configurable personal interests
AI-generated summaries and key takeaways for each content item, delivered to a Notion database as a clean reading inbox
Weekly Top-k Recap automatically assembled from the highest-ranked items across the week
Journal note organization with daily insights extraction and automated TODO list generation from takeaways
Experimental Deepdive multi-agent mode using AutoGen to autonomously web-search and synthesize reports on any topic
Flexible LLM backend support for OpenAI ChatGPT, Google Gemini, and self-hosted Ollama models

Common Use Cases

Daily tech briefing — a developer subscribes to 20+ RSS feeds and several subreddits; Auto-News filters and summarizes the top 10% into a morning Notion digest
AI researcher trend tracking — a researcher feeds Arxiv RSS, Twitter, and YouTube channels into the pipeline to catch new papers and discussions without manual scanning
Personal knowledge management — a knowledge worker combines journal notes with curated web articles to auto-generate a TODO list and daily insights in Notion
YouTube channel monitoring — a content creator tracks competitor channels by ingesting YouTube transcripts and generating AI summaries of each new video
Reddit signal extraction — a product manager monitors niche subreddits and filters discussions by relevance score to track user pain points and feature requests

Under The Hood

Architecture Auto-News follows a pipeline-operator pattern organized around Apache Airflow DAGs as the top-level orchestration layer. Each DAG represents a workflow (news pulling, journal processing, weekly recap, deepdive) and is composed of sequential BashOperator tasks that invoke dedicated Python scripts. Below the DAG layer, a family of Operator classes—one per source type (RSS, YouTube, Twitter, Reddit, articles, journal)—each implement a consistent pull/dedup/summarize/publish contract inherited from a shared OperatorBase. State for deduplication and caching is stored in Redis using template-based key naming, while Notion serves as the output and reading layer. This design cleanly separates scheduling concerns from business logic and makes adding new source types a matter of adding a new Operator without touching the DAG structure.

Tech Stack The backend is written in Python 3.9+ and orchestrated by Apache Airflow deployed via Docker Compose or Helm on Kubernetes. LLM integration is handled through LangChain 0.3, with pluggable backends supporting OpenAI (via the openai SDK), Google Gemini (langchain-google-genai), and local inference via Ollama. Content ingestion uses feedparser for RSS, tweepy for Twitter, the YouTube Transcript API and yt-dlp for video, and WebBaseLoader/BeautifulSoup for web articles. Redis acts as the deduplication and caching layer, MySQL stores structured state, and Notion serves as the human-facing reading interface via the notion-client SDK. Vector storage is supported through ChromaDB, Milvus, and Pinecone for the experimental embedding and semantic search features.

Code Quality The codebase has no automated test suite—no test files, no testing framework configuration, and no CI test step beyond a basic build badge. Error handling is generally present via try/except blocks with traceback printing, but exceptions are frequently swallowed or logged without propagating failures to the Airflow task level. The operator pattern provides reasonable structural consistency, and the LLM prompt library is centralized in a single module. Type annotations are absent throughout the source. The Airflow DAG definitions and operator implementations are well-separated but the overall quality reflects a personal productivity tool that has grown organically rather than a production-grade platform with enforced quality gates.

What Makes It Unique Auto-News stands out by combining multi-source heterogeneous feed aggregation with personalized LLM-based noise filtering in a single self-hostable pipeline—a combination most RSS readers and read-later apps do not attempt. The interest-based ranking that filters over 80% of content before it reaches the user is a meaningful differentiator from simple feed aggregators. The experimental multi-agent Deepdive mode, which uses AutoGen to autonomously search and synthesize reports, goes beyond passive aggregation into active research assistance. The choice to use Notion as the reading front-end rather than a custom web UI is pragmatic and lowers the barrier to use for Notion-first knowledge workers.

Self-Hosting

Auto-News is released under the MIT License, one of the most permissive open source licenses available. This means you can use, modify, distribute, and commercially deploy the software without restriction, provided you include the original copyright notice. There are no copyleft obligations—you are not required to open-source any modifications or applications built on top of it. This makes it a straightforward choice for both personal and commercial self-hosting scenarios.

Running Auto-News yourself requires a meaningful infrastructure footprint. The recommended setup calls for 8 CPU cores, 16 GB of RAM, and 100 GB of disk space, with a minimum of 2 cores and 6 GB to function. The stack involves Apache Airflow (with its own scheduler, webserver, and worker processes), Redis, MySQL, and optionally Milvus or another vector database. You are responsible for container orchestration (Docker Compose or Kubernetes via Helm), secret management for API keys (Notion, OpenAI, Twitter), upgrades, and ensuring the Airflow DAGs continue to run on schedule. External API credentials for Twitter, Reddit, and your chosen LLM provider must be obtained and rotated independently.

A managed commercial offering called Dots Agent is available from the same team, with web, iOS, and Android clients. The hosted version removes all infrastructure burden and is described as the quickest path to using the functionality. The self-hosted path gives you full data privacy, the ability to run local LLMs via Ollama instead of paying per-token to OpenAI or Google, and complete control over filtering logic—at the cost of operating a multi-service stack yourself. There is no documented SLA, enterprise support tier, or high-availability deployment guide for the open source version.

On This Page