nao

Name: nao
Rating: 5 (1377 reviews)

Build and deploy an open-source analytics agent that understands your data warehouse and answers business questions in plain English.

1.4Kstars

197forks

Apache License 2.0

TypeScript

View Source Visit Website

On This Page

nao is an open-source framework for building and deploying analytics agents on top of your existing data infrastructure. Data engineers and analytics engineers create a structured context — combining schema metadata, documentation, business rules, and query examples — that shapes how the AI agent reasons about your specific data. That context is versioned, testable, and fully under your control.

Once the context is built with the nao-core Python CLI, the nao chat interface gives business users a natural language front-end to query any connected warehouse: BigQuery, Snowflake, PostgreSQL, Databricks, Redshift, and more. Users type questions, the agent generates and runs SQL, and results come back with native visualizations — no SQL knowledge required from the end user.

What sets nao apart is its structured approach to agent reliability. Before deploying to users, data teams can write YAML-based unit tests that benchmark the agent against expected SQL outputs. Performance is tracked over time, and user thumbs-up/thumbs-down feedback flows back to the data team so the context can be improved iteratively.

nao is backed by Y Combinator and ships with Docker support, Slack and Telegram bot integrations, MCP compatibility, and multi-LLM support across OpenAI, Anthropic, Google Gemini, Mistral, Azure, and Ollama. The entire stack — CLI, backend, and frontend — is open source under Apache 2.0, with enterprise features (SSO, branding, licensing) gated behind a commercial tier.

What You Get

A Python CLI (nao-core) to initialize, sync, test, and deploy your analytics agent from the terminal
A self-hosted chat web UI where business users can ask questions in plain English and receive charts and tables instantly
A file-system-based context builder where you add schemas, docs, business rules, and example queries that the agent uses for reasoning
A built-in evaluation framework with YAML-based unit tests to measure agent accuracy against expected SQL before releasing to users
Native connectors for BigQuery, Snowflake, PostgreSQL, Databricks, Redshift, ClickHouse, DuckDB, Athena, Trino, and more via Ibis
Multi-LLM support with bring-your-own-keys for OpenAI, Anthropic, Google Gemini, Mistral, Azure OpenAI, AWS Bedrock, and Ollama
Slack and Telegram bot integrations so users can query data from within their existing messaging tools
Docker image with a bundled example project for one-command local setup

Common Use Cases

A data team at a SaaS company deploys nao so product managers can query event tables without writing SQL or waiting for analyst bandwidth
An analytics engineer at a retail company builds a nao context with sales schema metadata and regional business rules, then runs unit tests to validate query accuracy before going live
A startup uses nao on top of BigQuery to give the CEO a Slack bot that answers revenue and growth questions in real time
A data platform team self-hosts nao on Cloud Run with PostgreSQL to provide a secure, internal analytics chat layer without exposing warehouse credentials to end users
An analytics engineer versions the nao context in Git and iterates on agent performance by reviewing user feedback scores and re-running the test suite after each change

Under The Hood

Architecture nao separates concerns cleanly across three independently deployable layers: a Python CLI (nao-core) that owns context engineering and local developer workflows, a TypeScript/Fastify backend that orchestrates LLM agents and persists state, and a React/Vite frontend that handles user interaction. The backend follows a modular service pattern — agent orchestration, scheduler, Slack/Telegram bridges, MCP integration, and license validation are each encapsulated in dedicated service modules wired together by a tRPC router. Execution flows from an HTTP or WebSocket request through Fastify route handlers into tRPC procedures, which invoke agent services that fan out to LLM providers via the Vercel AI SDK. A separate FastAPI Python sidecar in the backend handles sandboxed Python code execution, keeping dynamic user-submitted code isolated from the Node.js process. This architecture makes individual integrations (a new messaging platform, a new LLM provider) addable without touching core agent logic.

Tech Stack The backend runs on Bun with Fastify for HTTP and WebSocket handling and exposes its API entirely through tRPC, with Drizzle ORM providing type-safe access to either SQLite (development) or PostgreSQL (production). The frontend is a React 18 application built with Vite, using TanStack Router for file-based routing, TanStack Query for server state, and shadcn/ui components styled with Tailwind CSS. LLM calls are unified behind the Vercel AI SDK, which abstracts providers including Anthropic, OpenAI, Google Gemini, Mistral, Azure OpenAI, AWS Bedrock, Vertex AI, and Ollama. The Python CLI is built with Cyclopts for command parsing, Ibis for multi-warehouse SQL generation, and Pydantic for configuration validation; it ships as a PyPI package (nao-core) with optional extras for each database backend and LLM provider. The project ships a multi-stage Dockerfile and Docker Compose configurations for both development and production deployment.

Code Quality The backend has an extensive test suite with nearly 30 test files covering agent compaction logic, context recommendation reconciliation, OIDC authentication hooks, license validation, and utility functions — all using Vitest. TypeScript is used throughout the backend and frontend in strict mode, and Zod schemas validate API boundaries via fastify-type-provider-zod. The Python CLI uses Ruff for linting and type checking with ty. ESLint with simple-import-sort enforces import order across the monorepo, and Husky pre-commit hooks enforce formatting on every commit. Error handling in the backend uses typed custom error classes (HandlerError, BudgetExceededError) propagated consistently through the tRPC layer. The combination of typed API contracts, unit tests on key agent behaviours, and automated linting reflects a solid engineering foundation for a project at this stage.

What Makes It Unique nao’s core innovation is the concept of structured “context engineering” as a first-class development workflow. Rather than prompting an LLM directly with raw schema dumps, data teams build a versioned, file-system-organised context that the agent references — similar to how a well-onboarded human analyst would maintain a runbook. The evaluation framework closes the feedback loop: YAML unit tests measure agent accuracy against expected SQL outputs before each deployment, and user thumbs-up/thumbs-down signals flow back into the same test pipeline. This positions nao closer to a software development lifecycle for analytics agents than to a simple text-to-SQL wrapper. The context compaction agent, which automatically manages long tool-call histories to stay within provider context windows, is a technically specific solution to a real-world deployment problem that generic analytics chatbots typically ignore.

Self-Hosting

nao is primarily released under the Apache License 2.0, which permits free use, modification, and redistribution — including for commercial purposes — as long as attribution is maintained and modified files are marked. However, a subset of files in the repository carry a /* @license Enterprise */ comment and are governed by nao Labs’ commercial license instead. These include SSO/OIDC authentication, custom branding, and license management modules. The Apache 2.0 core can be used in production freely; the enterprise-licensed files require a valid commercial subscription from nao Labs to use in production.

Running nao yourself requires two main components: the nao-core Python CLI (installable from PyPI) for managing context, and the nao chat application (a Fastify/tRPC backend plus a React/Vite frontend) deployable via Docker. The backend supports both SQLite (for development) and PostgreSQL (recommended for production) as its internal database. You are responsible for provisioning a database, configuring your warehouse credentials, managing LLM API keys, and handling deployments, SSL termination, and updates. The project ships a Docker Compose setup and a deployment guide targeting Cloud Run with PostgreSQL, which lowers the operational barrier, but ongoing maintenance — upgrades, backup strategies, and uptime — remains your responsibility.

Compared to a hypothetical managed nao cloud offering, self-hosting means no SLA, no managed upgrades, and no enterprise support channel unless you hold a commercial subscription. The enterprise tier adds SSO/OIDC for centralised identity management, custom branding for white-label deployments, and Microsoft Entra authentication. The active release cadence (65+ releases, roughly 12 per month) means the codebase moves quickly, so keeping a self-hosted instance current requires regular attention to changelogs and migration scripts.

On This Page

Repository Health

Pre-computed score based on development activity, maintenance, community, maturity, and trend momentum.

82/100Excellent

Development Activity96

Maintenance100

Community64

Maturity28

Momentum40

Growing community supportVery active developmentWell-maintained with consistent updatesRapidly growing project

Technical Analysis

76/100Good

Architecture82

Code Quality78

Innovation80

Learning Curve65

Repository Stats

Contributors

Total Commits

624

Monthly Commits

Watchers

Repo Age

6 months

Last Commit

2 days ago

Built With

TypeScript74.3%

Python24.4%

Recent Releases

67 total

~11.0 releases/month

Topics

agentic-analytics analytics analytics-engineering bigquery business-intelligence chat-with-your-data context-engineering data data-analysis data-analyst data-engineering databricks