Sourcebot

A self-hosted, AI-powered code search engine that indexes every repo across GitHub, GitLab, Bitbucket, Gitea, Gerrit, and Azure DevOps, so both engineers and coding agents can search, browse, and ask questions about your codebase from one place.

3.6Kstars
315forks
FSL-1.1-ALv2
TypeScript

Sourcebot is a self-hosted code search and code-intelligence platform built by Taqla Inc., a Y Combinator-backed (F2025) startup. It connects to GitHub, GitLab, Bitbucket, Gitea, Gerrit, and Azure DevOps, indexes every configured repository and branch, and gives engineers a single place to search and browse code that would otherwise be scattered across dozens of separate repos and hosts.

Under the hood, indexing and search execution are delegated to Zoekt, the trigram-based code search engine originally built at Google and later popularized by Sourcegraph, wrapped in a modern Next.js UI with a custom query grammar for regex, boolean, and path/language-filtered search. On top of that, Sourcebot layers an “Ask Sourcebot” AI chat that answers natural-language questions about a codebase with inline citations back to the actual source lines, powered by the Vercel AI SDK across a wide range of LLM providers. Structured code navigation (go-to-definition, find-references) and the AI chat sit behind Sourcebot’s paid plan; the free, no-registration self-hosted tier covers core multi-host indexing and full-text/regex search.

Sourcebot also ships its own Model Context Protocol (MCP) server, so external coding agents like Claude Code, Cursor, or GitHub Copilot can query an organization’s entire indexed codebase over Streamable HTTP rather than being limited to whatever files happen to be open in the local workspace — a use case that pushes Sourcebot beyond being just a search box and toward being a shared code-context layer for both humans and agents.

Sourcebot’s core is source-available under the Functional Source License, not an OSI-approved open-source license, and a parallel ee/ directory of enterprise features (SSO, SCIM, audit logs, permission syncing, analytics, and more) requires a commercial license key on top of that.

What You Get

  • Full-text and regex code search across every configured repo and branch, powered by the Zoekt trigram index.
  • A single Docker image / Docker Compose stack bundling the web app, backend indexing worker, Postgres, and Redis.
  • Connectors for GitHub (including a dedicated GitHub App), GitLab, Bitbucket, Gitea, Gerrit, and Azure DevOps.
  • An “Ask Sourcebot” AI chat assistant that answers questions about your codebase with inline citations, once licensed.
  • A built-in MCP server so external coding agents can query your indexed codebase over Streamable HTTP, once licensed.
  • Prometheus metrics, OpenTelemetry tracing, and Sentry error tracking wired into both the web and backend packages out of the box.

Common Use Cases

  • Centralizing code search across a multi-repo, multi-host engineering organization.
  • Grounding AI coding agents (Claude Code, Cursor, Copilot) with whole-org code context via the MCP server.
  • Speeding up new-engineer onboarding by making every repo searchable and browsable from one URL.
  • Keeping proprietary source code and AI chat inputs inside company-controlled infrastructure instead of a third-party SaaS search tool.

Under The Hood

Architecture Sourcebot is a Yarn 4 monorepo of six workspace packages (web, backend, db, shared, schemas, queryLanguage). The backend package runs as a long-lived worker: a ConnectionManager schedules BullMQ/Redis jobs that pull repository metadata from configured code hosts through per-host compile functions, while a RepoIndexManager shells out to the Zoekt binaries to build per-org, per-repo trigram index shards. The web package is a Next.js application serving search, code browsing, and the AI chat surface, backed by Prisma/Postgres via the shared db package, with search syntax parsed by a custom Lezer grammar in queryLanguage. The most consequential architectural decision is the open-core split: every enterprise-gated capability lives in a parallel ee/ subtree in both backend and web, checked at runtime against signed license entitlements — if that entitlements contract drifts from the license-issuing service, both the self-hosted gate and the hosted license server have to be updated together.

Tech Stack The stack is TypeScript end-to-end. The web app pairs Next.js and NextAuth (with a Prisma adapter) for auth, Prisma ORM against Postgres for persistence, the Vercel AI SDK wired to a wide roster of LLM providers for its chat and MCP features, and CodeMirror 6 with an extensive set of language-grammar packages for the file viewer. Observability runs through PostHog, Sentry, OpenTelemetry, and Prometheus/Grafana Alloy. The backend worker relies on BullMQ/Redis for job scheduling and host-specific API clients to talk to each supported code host, while search execution itself is delegated to Zoekt rather than reimplemented. The whole stack ships as a single Docker image alongside a Postgres+Redis Compose file, with Kubernetes/Helm offered as an alternative deployment target.

Code Quality The repository includes an extensive suite of Vitest unit tests spanning the backend connectors, git operations, Zoekt indexing helpers, and the custom search-query grammar, with dedicated test fixtures covering grammar edge cases like negation, grouping, and operator precedence. Configuration and license payloads are validated at runtime with Zod schemas rather than trusted at the type level alone, and error handling is centralized. CI enforces linting and the full test suite on every pull request, plus dedicated license-audit and vulnerability-triage workflows, and even an automated bug-fixing workflow. Naming is consistent and the codebase is organized around clear manager/service classes rather than loosely-scoped utility files.

What Makes It Unique Rather than building yet another code-indexing engine, Sourcebot builds its product layer — multi-host connectors, permissioning, AI chat, and its own MCP server — directly on top of Zoekt, a mature trigram search engine already used elsewhere in the industry. Its most distinctive feature is shipping its own MCP server so external coding agents such as Claude Code, Cursor, or Copilot can pull whole-organization code context rather than being limited to a single open workspace, an early and fairly complete take on “your code search tool as an agent-facing service.” The open-core boundary is also unusually rigorous, with a parallel ee/ source tree and cryptographically signed license enforcement rather than a simple config flag.

Self-Hosting

Licensing Model Sourcebot’s core is source-available under the Functional Source License (FSL-1.1-ALv2), not an OSI-approved open-source license. The FSL permits use, modification, and self-hosting for any purpose except operating a competing commercial code-search/AI-chat product or service; two years after each version’s release date, that specific version automatically converts to the fully permissive Apache License 2.0. A separate ee/ subtree of the codebase (the enterprise features) is licensed even more restrictively: per ee/LICENSE, it may only be used for internal business purposes under a paid Sourcebot Enterprise license with a valid seat count.

Self-Hosting Restrictions

  • The FSL prohibits redistributing Sourcebot, or a derivative, as a competing hosted or commercial code-search/AI-chat offering until each version’s change date has passed.
  • Everything under any ee/ folder needs an active commercial license key or subscription to legally use, regardless of whether you self-host it.

Enterprise Features Per the project’s own documentation, the following require a paid plan (an online Activation Code or an offline License Key): SSO / external identity providers, SCIM provisioning, audit logs, analytics, permission syncing, role management, the GitHub App connector, the “Ask Sourcebot” AI chat, code navigation (go-to-definition / find-references), search contexts, and the built-in MCP server (plus its OAuth flow). The free, no-registration self-hosted tier covers core multi-host repository indexing and full-text/regex code search.

Cloud vs Self-Hosted Sourcebot does not offer a general-purpose hosted SaaS product — a hosted “public demo” exists, but production deployments are expected to be self-hosted, with the paid tier unlocked either online (an Activation Code that periodically syncs with Sourcebot’s license server) or offline (a signed SOURCEBOT_EE_LICENSE_KEY environment variable).

License Key Required Yes, for most of the enterprise feature list above. These are gated behind hasEntitlement() checks in code (packages/shared/src/entitlements.ts) that require either a synced online Activation Code or an offline SOURCEBOT_EE_LICENSE_KEY with a valid, unexpired, cryptographically signed payload. Without one, a self-hosted deployment automatically runs on the free plan.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search