Onyx is an open-source application layer for LLMs that transforms how teams interact with AI by combining advanced retrieval-augmented generation (RAG), web search, code execution, and custom agents into a single, self-hostable interface. It’s designed for developers, enterprise teams, and AI practitioners who need reliable, context-aware AI responses grounded in their internal data and real-time web information.
Built with Python and Next.js, Onyx supports deployment via Docker, Kubernetes, Helm, and Terraform, and integrates with all major LLM providers including OpenAI, Anthropic, Gemini, Ollama, and vLLM. Its modular architecture includes vector and keyword indexing, Redis caching, MinIO blob storage, and a flexible MCP framework for external tool integration.
What You Get
- 🔍 Agentic RAG - Combines hybrid vector and keyword search with AI agents to deliver high-accuracy answers by retrieving and synthesizing information from internal documents and external sources.
- 🔬 Deep Research - Performs multi-step, iterative research workflows to generate comprehensive reports, outperforming ChatGPT, Claude, and Notion AI in independent benchmarks.
- 🤖 Custom Agents - Build and deploy AI agents with custom instructions, knowledge bases, and actions tailored to specific roles like sales, engineering, or support.
- 🌍 Web Search - Integrates with Serper, Google PSE, Brave, SearXNG, Firecrawl, and Exa to fetch up-to-date web information during conversations.
- 💻 Code Execution - Safely runs Python code in a sandboxed environment to analyze data, generate visualizations, or modify files directly from chat.
- 📄 Artifacts - Generates downloadable documents, images, and other files as part of AI responses, enabling direct output for reports and presentations.
- 🎙️ Voice Mode - Enables voice-based interaction through text-to-speech and speech-to-text capabilities for hands-free AI access.
- 🎨 Image Generation - Creates images from text prompts using integrated generative models, enhancing visual communication within chats.
- 🔐 Enterprise SSO & RBAC - Supports Google OAuth, OIDC, SAML, SCIM user provisioning, and granular role-based access control for secure team deployment.
- 📊 Analytics & Query History - Tracks usage metrics by team, LLM, or agent, and logs all queries for audit compliance and usage optimization.
- 🖼️ Whitelabeling - Customize branding, logos, colors, and naming to align Onyx with your organization’s identity.
- 🔌 50+ Connectors - Pre-built integrations for Notion, Confluence, GitHub, Slack, Google Drive, and more, with real-time sync and access control preservation.
Common Use Cases
- Running a secure internal knowledge assistant - A tech company deploys Onyx to let employees ask questions about product docs, engineering specs, and past support tickets, with answers grounded in internal data and protected by SSO and RBAC.
- Empowering sales teams with real-time product intelligence - Sales reps use Onyx to instantly retrieve updated pricing, competitor comparisons, and customer conversation history from CRM and knowledge bases during client calls.
- Automating engineering documentation and debugging - Developers query Onyx to understand codebases, generate test cases, and execute code snippets to diagnose issues without leaving their chat interface.
- Building a compliant AI customer support bot - A SaaS company uses Onyx to power its helpdesk, ensuring responses are sourced from approved documentation, with audit logs and PII redaction for GDPR compliance.
Under The Hood
Architecture
- Modular backend organized into isolated components (backend, model_server, ee, dev) with scoped dependencies for targeted deployment and testing
- Clear separation of concerns using FastAPI for HTTP, SQLAlchemy for data access, and Celery for async workflows, with centralized error handling via custom exceptions
- Dependency injection patterns enforce clean boundaries between repositories, services, and routers, while multi-tenant and lightweight modes are handled through configuration-aware initialization
- Pre-commit hooks and lazy import validation ensure module boundaries remain intact and runtime side effects are eliminated
Tech Stack
- Python 3.11+ backend powered by FastAPI and SQLAlchemy with async PostgreSQL connectivity via asyncpg
- uv-based dependency management with pyproject.toml extras enables reproducible, environment-specific builds
- Helm charts orchestrate Kubernetes deployments for core services including Vespa, OpenSearch, and MinIO, with profiles for enterprise and lightweight deployments
- Comprehensive tooling for code quality including pre-commit hooks, structured logging, and observability integrations with Prometheus, OpenTelemetry, and Sentry
- Multi-modal document processing via Unstructured and PyPDF, with native integrations for enterprise collaboration platforms like Slack, Jira, and Salesforce
Code Quality
- Extensive test coverage across unit, integration, and end-to-end scenarios with realistic mocking and real-service interactions
- Strong type safety, consistent naming, and explicit error code mappings ensure predictable behavior and reduce runtime failures
- Linting, URL validation, SSRF protection, and environment-aware test gating enforce robustness and security
- Dependency injection and mocking strategies isolate components while preserving real-world behavior in integration tests
What Makes It Unique
- Native LLM access control tied to role-based permissions, guiding non-admin users with contextual prompts rather than outright restrictions
- Unified text rendering system that dynamically adapts to themes and accessibility needs through semantic tokens, eliminating style duplication
- Context-aware API client that intelligently adjusts timeouts and detects infrastructure-level blocks like AWS ALB/WAF to surface issues to users
- Built-in Markdown-to-UI engine that preserves semantic structure without external dependencies, enabling rich rendering in constrained environments
- Storybook-driven component documentation that serves as both a design system and living test suite, enhancing onboarding and maintainability