Jan

Run LLMs 100% locally with full privacy, or connect to cloud AI — your machine, your data, your control.

40.9Kstars
2.6Kforks
Other
TypeScript

Jan is an open-source desktop application that brings the best of AI directly to your machine. It lets you download and run large language models like Llama, Gemma, Qwen, and DeepSeek entirely offline using llama.cpp and a native Tauri backend, with zero data leaving your computer when you want it that way. For users who need cloud capabilities, Jan also connects to GPT-4, Claude, Mistral, Groq, and other providers through standard API integrations — all from a single interface.

Built on TypeScript, React, and Rust, Jan ships as a native desktop application for Windows, macOS (including Apple Silicon with MLX acceleration), and Linux. It exposes an OpenAI-compatible local API server at localhost:1337, making it a drop-in backend for other AI tools and scripts. The Model Context Protocol (MCP) integration enables agentic capabilities, letting Jan connect to external tools, browser extensions, and data sources for more powerful workflows.

Jan organizes work around projects and assistants, allowing you to create specialized AI personas with custom instructions and tool configurations. The built-in model hub connects to Hugging Face so you can browse and download hundreds of open-weight models without leaving the app. A local API server mode transforms Jan into a private, self-hosted inference endpoint for the rest of your toolchain.

What You Get

  • Local LLM Inference - Download and run GGUF models like Llama 3, Gemma 2, Qwen 2.5, and DeepSeek entirely offline using llama.cpp, with GPU acceleration for NVIDIA, AMD, and Intel Arc hardware.
  • Apple Silicon MLX Backend - Native MLX acceleration backend for Apple Silicon Macs, delivering significantly faster inference than CPU-only or Metal llama.cpp execution on M-series chips.
  • OpenAI-Compatible Local API - A built-in server at localhost:1337 that mirrors the OpenAI chat completions API, letting you use Jan as a drop-in backend for any tool or script built against the OpenAI SDK.
  • Cloud Model Integration - Connect to OpenAI, Anthropic, Mistral, Groq, MiniMax, and other cloud providers from within the same interface, with a unified model selector across local and remote models.
  • Model Context Protocol (MCP) - Full MCP support for agentic capabilities: connect to external tools, run browser MCPs, use file system access, and integrate Jan into multi-step AI workflows.
  • Projects and Custom Assistants - Organize conversations into projects with dedicated assistants, each with custom system prompts, tool configurations, and model assignments for specialized workflows.
  • Hugging Face Model Hub - Browse and download from hundreds of open-weight models directly inside the app, with automatic detection of embedding models, vision capabilities, and multimodal support.
  • File Attachments and RAG - Attach documents and files to conversations with built-in RAG support via the vector-db extension, enabling retrieval-augmented generation over your own data.

Common Use Cases

  • Private document analysis - A legal professional runs Llama 3 locally inside Jan to summarize and query confidential contracts, with the assurance that no text is transmitted to external servers.
  • Local AI backend for developer tooling - A developer points their internal CLI or IDE plugin at Jan’s localhost:1337 API to get OpenAI-compatible completions from a locally hosted model during offline development or air-gapped environments.
  • Agentic research with MCP - A researcher connects Jan to browser MCP and file system tools to automate multi-step information gathering — searching the web, reading local files, and synthesizing findings in a single conversation.
  • Apple Silicon inference optimization - A macOS developer switches Jan’s backend to MLX to get dramatically faster generation speeds on an M2/M3 Mac compared to llama.cpp Metal, without changing their workflow.
  • Multi-model comparison and evaluation - An AI practitioner uses Jan’s unified interface to switch between local Gemma 2 and cloud GPT-4 for the same prompt, comparing quality and latency without managing separate applications.
  • Privacy-safe productivity assistant - A healthcare worker runs Qwen2.5 locally with a custom assistant prompt configured to answer clinical queries, ensuring patient-related inputs never leave the local network.

Under The Hood

Architecture Jan is organized as a Yarn workspace monorepo with a clean three-layer separation: a typed core library, a React/Vite web application, and a collection of independently packaged Tauri extensions. The core library defines abstract base classes and typed event contracts that extensions must implement, with a centralized event bus decoupling extension behavior from application logic entirely. Extensions ship as pre-packaged .tgz assets bundled into the Tauri binary and loaded at runtime, enabling the plugin system to remain modular without requiring external package registries. The Rust backend in src-tauri handles file I/O, process management, MCP server lifecycle, download orchestration, and the OpenAI-compatible HTTP server — all exposed to the TypeScript layer via Tauri’s IPC command system. This architecture means the JavaScript layer never touches the OS directly, and the Rust layer never contains UI logic, making each layer independently testable and replaceable.

Tech Stack The application is built on Tauri 2.x with a Rust backend and a React 19 + Vite frontend. TypeScript is used throughout the web layer with strict typing enforced via ESLint and custom type guards. The llamacpp extension integrates with a bundled llama.cpp binary via a custom Tauri plugin (tauri-plugin-llamacpp), while a separate MLX extension handles Apple Silicon inference through a Swift/Python MLX server. State management in the web app uses Zustand stores, and routing is handled by TanStack Router with file-based route generation. Vitest runs the test suite across core, web-app, and extension workspaces. The MCP integration uses the rmcp crate on the Rust side for protocol-compliant server communication, with a TypeScript layer for tool approval and configuration UI.

Code Quality The codebase shows strong engineering discipline: comprehensive Vitest test coverage across core abstractions, the llamacpp extension’s backend logic, and UI component hooks; strict TypeScript interfaces with explicit discriminated unions for model types, message states, and extension contracts; and a consistent Arrange-Act-Assert pattern in test suites. The core library has JSDoc documentation on all public APIs, and the extension interface is well-specified with abstract lifecycle hooks (onLoad/onUnload) and typed settings descriptors. Error handling leans on TypeScript type narrowing and defensive fallbacks rather than custom exception hierarchies, which is pragmatic for a desktop app but means some failure paths surface as silent no-ops. CI runs linting, testing, and platform-specific builds across Windows, macOS, and Linux via GitHub Actions, with Husky pre-commit hooks enforcing code standards locally.

What Makes It Unique Jan’s most distinctive technical choice is its dual inference backend strategy: the llamacpp extension embeds a fully managed llama.cpp binary with automatic GPU backend detection (CUDA, ROCm, Metal, Vulkan), while the MLX extension provides a separate native server optimized for Apple Silicon that achieves inference speeds competitive with cloud providers. The GGUF preset generation system dynamically constructs llama-server router configurations from per-model YAML metadata, enabling fine-grained control over context size, attention caching strategy, and MTP (Multi-Token Prediction) layers per model. The extension system’s typed API surface — where each extension declares its capabilities through SettingComponentProps descriptors — allows the UI to render dynamic settings panels for any extension without hardcoded knowledge of specific backends. Combined with first-class MCP integration that handles tool approval, server lifecycle, and multi-server orchestration, Jan sits in a technically sophisticated position between a simple model runner and a full agentic AI platform.

Self-Hosting

Jan is licensed under the Apache License 2.0, maintained by Menlo Research. This is a permissive open-source license with no copyleft requirements: you can use Jan commercially, modify the source code, and redistribute it without being required to open-source your changes. The only obligations are preserving copyright notices and the license text. The README notes Apache 2.0 explicitly, and the repo contains no separate enterprise or paid-tier source directories — every capability in the current release is available under the same open license to all users.

Running Jan yourself requires a moderately capable consumer machine. macOS users need version 13.6 or newer, with 8 GB RAM minimum for 3B-parameter models and 16–32 GB for larger ones. Windows and Linux users with NVIDIA or AMD GPUs get hardware-accelerated inference through CUDA and ROCm backends, which Jan installs on demand. You are responsible for your own storage (GGUF model files range from 2 GB to 70+ GB), network bandwidth for model downloads, and OS-level updates to keep Tauri and the bundled llama.cpp binary current. There is no built-in auto-update for model weights — you manage that through the in-app hub or by manually placing GGUF files in the data folder.

Jan does not offer a managed cloud tier or commercial support contracts from the maintainers. There is no SLA, no managed uptime, no automated backup of your conversation history or model files, and no enterprise SSO or audit logging. What you gain by self-hosting is complete data sovereignty: no telemetry is sent when using local models, and your conversation data never leaves the machine. The trade-off is that you own the operational burden — debugging inference issues, handling model compatibility across llama.cpp versions, and managing disk space. The community Discord is active and the GitHub issues tracker is responsive, but formal enterprise support channels do not exist at this time.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search