Trieve is an open-source platform that unifies semantic search, recommendation engines, and retrieval-augmented generation (RAG) into a single API-driven system. Built for developers and AI teams building search-heavy applications, it solves the fragmentation problem of stitching together vector databases, embedding models, and LLMs by providing a cohesive, production-ready stack. Whether you’re powering a knowledge base, e-commerce product search, or AI chatbot, Trieve delivers relevance-ranked results with minimal configuration.
Technically, Trieve is built in Rust using Actix-web and Diesel, with Qdrant for vector storage and PostgreSQL for metadata. It supports hybrid search via neural sparse vectors (SPLADE) and cross-encoder re-ranking (BAAI/bge-reranker-large), and integrates with OpenAI, Jina, and Groq for embeddings and LLMs. Deployment options include Docker Compose, Kubernetes, AWS, GCP, and self-hosted VPC environments, with TypeScript and Python SDKs for quick integration.
What You Get
- đź”’ Self-Hosting in your VPC or on-prem - Full documentation for deploying Trieve on AWS, GCP, Kubernetes, and Docker Compose with no vendor lock-in.
- đź§ Semantic Dense Vector Search - Uses Qdrant and OpenAI/Jina embedding models to perform semantic search over text chunks with high accuracy.
- 🔍 Typo Tolerant Full-Text/Neural Search - Leverages naver/efficient-splade-VI-BT-large-query for sparse vector search that handles typos and natural language queries.
- 🖊️ Sub-Sentence Highlighting - Automatically highlights exact matching sentences within search results using the simsearch crate for improved UX.
- 🌟 Recommendations - Recommends similar chunks or files based on user behavior like bookmarks or upvotes, powered by vector similarity.
- 🤖 Convenient RAG API Routes - Built-in endpoints for RAG with OpenRouter integration, supporting both fully-managed RAG with topic memory and custom context generation.
- 🔄 Hybrid Search with cross-encoder re-ranking - Combines dense and sparse vectors with BAAI/bge-reranker-large to improve result relevance.
- 📆 Recency Biasing - Automatically boosts recently created content in search results to prevent stale outputs.
- 🛠️ Tunable Merchandizing - Adjust search relevance using signals like clicks, add-to-cart events, or citations as ranking factors.
- 🕳️ Filtering - Supports date-range, substring, tag, and numeric filters to narrow search results with precision.
- 👥 Grouping - Groups multiple chunks under a single file ID to avoid duplicate top-level results in search output.
Common Use Cases
- Powering a knowledge base search - A SaaS company uses Trieve to enable natural language search over internal documentation with sub-sentence highlighting and recency biasing.
- Building an AI-powered e-commerce product finder - An online retailer implements Trieve’s hybrid search and recommendations to surface products based on user behavior and semantic similarity.
- Running a RAG-powered customer support chatbot - A support team deploys Trieve’s RAG API to pull relevant help articles and generate accurate responses using OpenAI or Groq LLMs.
- Creating a research paper discovery engine - A university lab uses Trieve to index and search academic PDFs with semantic search, grouping, and cross-encoder reranking for precision.
Under The Hood
Architecture
- Modular microservice architecture with clearly delineated components (data ingestion, embedding processing, reranking) running as isolated containers connected via Docker networks and Redis-based task queues
- Event-driven producer-consumer workflow using Redis as a message broker, enabling asynchronous, scalable processing of crawling and embedding tasks
- Dependency injection implemented through environment variables, allowing flexible runtime configuration without hard-coded dependencies
- Infrastructure-as-code practices with environment-specific Docker overlays and Makefile automation ensure consistent deployment across environments
- Lightweight design prioritizes orchestration over rich domain modeling, with minimal service layer abstractions
Tech Stack
- Rust backend with PostgreSQL for structured data, Qdrant for vector storage, and Redis for caching and job queuing
- Dockerized AI services leveraging Hugging Face embeddings (SPLADE, BGE, Jina) with CPU/GPU variants, MinIO for object storage, and Apache Tika for document parsing
- Keycloak for authentication and a monorepo frontend built with Solid.js, TypeScript, and Turbo for unified search, chat, and dashboard applications
- Firecrawl microservices powered by Puppeteer and Bull for web scraping, orchestrated through containerized pipelines
- CI/CD and release automation driven by release-please and Makefile-based Docker orchestration with dynamic configuration
Code Quality
- Test suites are auto-generated from OpenAPI specs with minimal validation logic, lacking meaningful assertions or edge-case coverage
- Code organization is fragmented across SDKs, APIs, and infrastructure manifests, with blurred boundaries between concerns
- Absence of structured error handling, custom error types, or consistent response formatting undermines robustness
- Inconsistent naming conventions and lack of type guards or runtime validation reduce reliability and maintainability
- No visible linting, static analysis, or code quality enforcement tooling in the repository
What Makes It Unique
- Native integration of vision models into ETL pipelines enables automated semantic transformation of unstructured product data using custom prompts, eliminating manual tagging
- Unified analytics dashboard provides real-time insights into search behavior and recommendation performance, creating a closed-loop feedback system for content optimization
- Server-side AI transformation layer dynamically restructures raw data based on user-defined prompts, bridging ingestion and search-ready content in a single pipeline
- Shopify extension surfaces unmet search queries directly within the merchant’s e-commerce platform, revealing hidden customer intent
- End-to-end traceability from S3 ingestion through AI transformation to search analytics and recommendation tuning, all unified in a single cohesive platform