Tabby is a self-hosted AI coding assistant designed for developers who need intelligent code completion, inline chat, and contextual Q&A without relying on cloud-based services. It empowers teams to maintain data sovereignty while leveraging state-of-the-art LLMs for code generation and knowledge retrieval. Built in Rust and optimized for consumer-grade GPUs, Tabby integrates seamlessly with VS Code, Vim, IntelliJ, and Cloud IDEs.
Tabby’s architecture is containerized via Docker, supports OpenAPI for external integrations, and uses SQLite for local state—eliminating the need for external databases. It supports models like CodeLlama, CodeGemma, CodeQwen, and Codestral, and offers RAG-based code completion using repository context. The platform includes an Answer Engine for internal documentation queries and a full admin UI for team management and usage analytics.
What You Get
- Self-hosted AI Code Completion - Delivers real-time, context-aware code suggestions using locally run LLMs like CodeLlama, CodeGemma, and Codestral, with support for RAG-based repository context to improve accuracy.
- Answer Engine - A centralized knowledge engine that answers coding questions using internal documentation, codebase context, and chat history—all accessible directly in your IDE without leaving your workflow.
- Inline Chat - Enables conversational coding within your editor; ask questions, refactor code, or generate tests without switching tabs or tools.
- OpenAPI Interface - Exposes a RESTful API for integrating Tabby with Cloud IDEs, CI/CD pipelines, or custom tooling, enabling programmatic access to code completion and chat features.
- GitLab & GitHub Integration - Connects to your code repositories to enable context-aware completions, index merge requests as chat context, and support SSO for enterprise teams.
- Admin UI with Team Analytics - Provides a full web-based dashboard to monitor usage, track team activity, view storage metrics, and manage user roles and permissions.
- Consumer-Grade GPU Support - Optimized to run on NVIDIA GPUs with CUDA and Apple M1/M2 with Metal, making high-performance AI coding assistance accessible without enterprise hardware.
- Docker-Based Deployment - One-command deployment via Docker with persistent data storage via volume mounts, enabling easy setup on any Linux, macOS, or cloud environment.
Common Use Cases
- Running a secure internal developer portal - A DevOps team deploys Tabby on-premises to provide AI-powered code suggestions and documentation lookup without exposing proprietary code to third-party AI services.
- Accelerating onboarding for new engineers - A startup uses Tabby’s Answer Engine to let new hires ask questions about the codebase in their IDE and get instant, accurate answers from internal docs and past commits.
- Compliance-sensitive environments (finance, healthcare) - A regulated enterprise runs Tabby locally to comply with data governance policies while still benefiting from AI-assisted coding productivity.
- Customizing AI assistants for legacy codebases - A legacy software team fine-tunes Tabby’s context window to prioritize understanding old code patterns and generates safe refactoring suggestions using local model weights.
Under The Hood
Architecture
- Rust-based monorepo with well-defined crates for core services, data access, and external integrations, enforcing modularity through Cargo workspaces
- Service layer pattern implemented via Axum with typed dependency injection, isolating business logic from HTTP routing
- Clean domain separation between schema definition, query execution, and database access using SQLx for type-safe operations
- Event-driven design with async/await and distributed tracing for non-blocking logging and cross-service observability
- Frontend and backend decoupled via GraphQL and OpenAPI, ensuring clean API contracts without shared internal state
- Automated build and deployment workflows with integrated schema migrations and documentation generation
Tech Stack
- Rust backend leveraging Axum, SQLx, and Tantivy for high-performance search and API serving
- TypeScript/React frontend built with Next.js and pnpm, coordinated via Turbo monorepo for unified tooling
- Embedded AI inference stack using llama-cpp-server and Ollama bindings with async-trait and tokio for concurrent model execution
- Comprehensive testing infrastructure with snapshot testing, structured assertions, and coverage tracking
- CI/CD pipelines automated through Makefile and Turborepo, with schema and documentation generation baked into the workflow
- Cross-platform IDE agents for VS Code, IntelliJ, and Eclipse, bundled with esbuild and linked via workspace dependencies
Code Quality
- Extensive test coverage with domain-specific utilities for validating code completion behavior across syntax contexts
- Consistent use of snapshot testing and golden files to ensure stable AI output across model iterations
- Clear separation of concerns between frontend, agents, and backend with type-safe serialization via serde
- Robust error handling and structured logging across both Rust and TypeScript layers, with thorough edge-case validation
- Strong type safety enforced through interfaces and structs, with exhaustive testing of parsing, filtering, and normalization logic
- Unified linting and test automation across languages, with deterministic execution via serial tests and fixed seeds
What Makes It Unique
- Native integration of code-aware LLM inference with real-time context extraction from source repositories, eliminating reliance on external APIs
- Unified interface that dynamically surfaces relevant code symbols and file contexts during conversational interactions
- Modular server architecture supporting both open-source and enterprise deployments from a single codebase
- Fine-grained UI feature gating based on license tier, with contextual tooltips explaining advanced capabilities
- Streaming chat endpoint with built-in structured logging and user-aware tracing for auditability
- Context-aware markdown renderer that enables IDE-like navigation and editing directly from chat responses