Self-hosted AI coding assistant — run GitHub Copilot-grade code completion on your own hardware with no cloud dependency.
Tabby is an open-source, self-hosted AI coding assistant that brings intelligent code completion, an inline chat interface, and a RAG-powered Answer Engine directly to your team’s development environment. Built with a Rust core and optimized for consumer-grade GPUs (CUDA, ROCm, Metal), it requires only a single Docker command to deploy and runs without any external database or cloud service.
The project is organized as a Cargo workspace with a clear separation between the open-source core (Apache 2.0) and enterprise features housed under the ee/ directory. The core handles inference, indexing, and code completion via an OpenAPI-compliant HTTP server, while the enterprise layer — governed by a proprietary license — adds LDAP authentication, team management, license-gated features, and a full GraphQL API served through the Axum web framework.
Tabby integrates with VS Code, IntelliJ, Vim, and Eclipse through first-party clients that communicate over a shared Language Server Protocol-style agent. The Answer Engine uses repository-level context (including GitLab Merge Requests and GitHub issues) as retrieval sources, enabling precise, workspace-aware responses. The model registry supports StarCoder, CodeLlama, CodeGemma, CodeQwen, Codestral, and any OpenAI-compatible endpoint.
Development activity is active and release cadence is high, with numbered releases shipping roughly every few weeks and IDE plugins versioned independently. Codecov integration, snapshot golden-file tests in Rust, and unit tests across the TypeScript clients give the project a sound quality baseline.
Architecture
Tabby is structured as a Rust Cargo workspace whose members separate cleanly by responsibility: a core serving crate (Axum-based HTTP, OpenAPI, routing), inference crates (code generation, embedding, chat, decoding), an indexing subsystem (Tantivy full-text search, structured document indexer, repository crawlers), and a fully optional enterprise layer under the ee/ directory. The enterprise layer adds a GraphQL API through Juniper, a SQLite data store accessed via SQLx with type-checked queries, background job scheduling (license checks, repository sync, model indexing), and a Next.js web UI. This split means the community binary is a single stateless process that can run without a database, while the enterprise binary layers persistent team state on top. Dependency injection through trait objects keeps business logic decoupled from transport, making it straightforward to swap inference backends or storage drivers.
Tech Stack
The server is written in Rust, using Axum for HTTP routing, Tokio for async concurrency, SQLx for compile-time-checked SQLite queries, and Tantivy as the embedded search engine for code and document indexing. AI inference is handled by pluggable backends: a bundled llama-cpp-server for local GGUF models, Ollama bindings, and an async-openai-alt client for OpenAI-compatible remote endpoints. The enterprise web UI is a Next.js/React application managed in a pnpm Turborepo workspace alongside TypeScript clients for VS Code, IntelliJ, Vim, Eclipse, and a shared tabby-agent that implements Language Server Protocol-style communication and post-processing (completion deduplication, line trimming, context extraction). OpenTelemetry support is baked in for distributed tracing.
Code Quality
Test coverage spans multiple layers: golden-file tests in Rust validate that code completion output remains stable across model changes; migration tests in tabby-db verify schema evolution; TypeScript unit tests in tabby-agent and tabby-chat-panel cover post-processing pipelines and protocol handling; UI tests cover the markdown renderer and remark plugins. CI runs separate Rust, pnpm, and IntelliJ pipelines with automated release workflows for each client platform. Error handling is explicit throughout the Rust layer using anyhow and typed CoreError variants, and serde is used universally for type-safe serialization. The codebase uses serial_test for tests that require deterministic file-system state.
What Makes It Unique
Tabby’s most distinctive aspect is the tight co-design of an open-weight LLM inference runtime with a repository-aware retrieval pipeline — code completions and chat answers are enriched at query time by Tantivy-indexed AST snippets, commit history, merge requests, and ingested documentation, going beyond simple file-context injection. The dual Apache 2.0 / enterprise license split (with the community tier genuinely functional at up to 5 seats) lets teams run production-quality AI assistance without any cloud dependency and upgrade to managed enterprise features incrementally. The shared tabby-agent TypeScript package ensures that all IDE clients behave consistently, and the OpenAPI + GraphQL surface makes it practical to integrate with existing CI/CD pipelines and Cloud IDEs without custom coupling.
Tabby uses a split licensing model. Code outside the ee/ directory is released under Apache 2.0, which permits free commercial use, modification, and redistribution with attribution. Code inside ee/ is governed by the Tabby Enterprise License: it can be run and modified for development and testing without a subscription, but production use requires a valid paid license from TabbyML. This is a common open-core arrangement — you get a fully functional community tier (up to 5 seats, core completion and Answer Engine features) under open-source terms, while team management, LDAP, SSO, expanded seat counts, and custom branding require a paid plan.
Running Tabby yourself is genuinely straightforward for small teams. A single Docker image covers the full server; SQLite is the only persistence layer, so there is no separate database process to manage. GPU acceleration on NVIDIA (CUDA), AMD (ROCm), Apple Silicon (Metal), or Vulkan requires the appropriate drivers and a machine with enough VRAM for the model you choose — a 7B parameter model typically needs 8 GB of VRAM at 4-bit quantization. You are responsible for keeping the server updated, managing disk space for model weights and the search index, and arranging your own uptime monitoring. The project ships numbered releases roughly every few weeks, so staying current takes active attention.
Compared to a hosted SaaS coding assistant (GitHub Copilot, Cursor, Codeium), self-hosting Tabby means no managed uptime SLA, no automatic model updates, and no centralized billing portal. The enterprise tier adds support from TabbyML, higher seat counts, and additional authentication providers, but the operational burden — hardware provisioning, GPU driver maintenance, backup strategy, and incident response — remains yours. Teams with strict data-sovereignty requirements or air-gapped environments will find that trade-off worthwhile; teams that just want a quick productivity boost without infrastructure work are better served by a managed alternative.