PostgresML

Run ML training and LLM inference natively inside PostgreSQL with GPU acceleration — no data movement required.

6.7Kstars
357forks
MIT License
Rust

PostgresML is a PostgreSQL extension that brings machine learning and large language model inference directly into the database engine. Built in Rust using the pgrx framework, it exposes SQL functions like pgml.train, pgml.predict, pgml.embed, pgml.rank, and pgml.transform that execute ML operations in the same process as PostgreSQL — eliminating external API calls, data serialization overhead, and the latency of moving data to separate model servers.

The core insight is architectural: data for ML systems is inherently larger and more dynamic than the models themselves, so it is more efficient to bring models to the data rather than constantly moving data to the models. PostgresML realizes this by embedding Python bindings (via PyO3) inside the Rust extension, enabling direct invocation of Hugging Face Transformers, scikit-learn, XGBoost, LightGBM, and other libraries from within SQL queries — with optional GPU acceleration through CUDA.

The ecosystem extends beyond the extension itself. The project includes a Rocket-based web dashboard for managing models and visualizing metrics, a multi-language SDK (Rust, Python, JavaScript, C) for building RAG pipelines, integration with pgvector for approximate nearest neighbor search, and pgcat for horizontal connection pooling and sharding. Both self-hosted Docker deployments and a managed cloud service are available, making it accessible from local development to production-scale workloads.

What You Get

  • In-Database ML Training - Train classification and regression models using pgml.train() with 47+ algorithms including XGBoost, LightGBM, SVM, and Logistic Regression directly from SQL, storing trained models in PostgreSQL tables.
  • LLM Inference via pgml.transform - Run Hugging Face language models for text generation, summarization, translation, classification, and question answering directly in SQL using the pgml.transform() function with GPU acceleration.
  • Vector Embeddings with pgml.embed - Generate dense vector embeddings from text using pre-trained sentence transformer models in-database, integrated with pgvector for efficient approximate nearest neighbor search.
  • Complete RAG Pipeline in SQL - Execute end-to-end Retrieval-Augmented Generation pipelines using four composable functions: pgml.chunk (text splitting), pgml.embed (vectorization), pgml.rank (cross-encoder re-ranking), and pgml.transform (text generation).
  • GPU-Accelerated Inference - Leverage NVIDIA CUDA for model inference, achieving 8-40x faster throughput than HTTP-based model serving architectures by eliminating network serialization overhead.
  • Multi-Language SDK - Build RAG applications in Python, Rust, JavaScript, or C using the Korvus SDK, which wraps the full RAG pipeline into a single parameterized database query.
  • Web Dashboard - Manage ML projects, inspect trained models, compare performance metrics, and run SQL queries through a Rocket-powered web interface with live streaming capabilities.
  • Horizontal Scalability via pgcat - Scale to millions of transactions per second using pgcat, the companion PostgreSQL connection pooler with sharding, load balancing, and failover support.
  • Hugging Face Model Hub Access - Load any of thousands of pre-trained models from the Hugging Face hub directly in SQL, including quantized models (GPTQ) and custom embeddings with configurable initialization parameters.
  • Security-by-Default Architecture - Keep sensitive training data and model weights co-located inside the database perimeter, avoiding data exfiltration to external model APIs.

Common Use Cases

  • Semantic search over application data - A SaaS company uses pgml.embed to generate embeddings for user-generated content stored in PostgreSQL and pgvector to serve sub-millisecond semantic search results without maintaining a separate vector database service.
  • Real-time fraud scoring at transaction time - A payments processor trains an XGBoost model on historical transaction features with pgml.train and calls pgml.predict inside a trigger or application query to score each transaction before authorization.
  • Document RAG with knowledge-base chatbot - A developer builds a customer support bot by chunking documentation with pgml.chunk, embedding with pgml.embed, retrieving relevant passages via vector search, and generating answers with pgml.transform — all in a single parameterized SQL query via Korvus.
  • Batch NLP enrichment pipelines - A media company runs pgml.transform over millions of articles stored in PostgreSQL to generate classifications, extract named entities, or produce summaries as a background SQL job, avoiding the overhead of exporting data to an external pipeline.
  • Personalized recommendation with ANN - An e-commerce platform embeds product descriptions and user interaction histories with pgml.embed, then uses pgvector ANN search to rank candidate products for each user during page load without a separate recommendation microservice.
  • LLM fine-tuning on private data - A healthcare organization uses PostgresML’s fine-tuning support (via trl and LoRA) to adapt open-source LLMs on medical records stored in PostgreSQL, keeping training data inside the database perimeter.

Under The Hood

Architecture PostgresML is organized as a multi-package monorepo where the core PostgreSQL extension, web dashboard, multi-language SDK, and tooling components are developed together but deployed independently. The pgml-extension package implements the SQL-callable ML functions as a native PostgreSQL shared library using the pgrx crate, which provides Rust macros for defining pg_extern functions, accessing Shared Memory, and using SPI for in-extension SQL execution. Python ML capabilities are bridged through PyO3, which embeds a Python interpreter inside the Rust extension process, allowing direct invocation of Hugging Face Transformers, scikit-learn, XGBoost, and LightGBM from within the PostgreSQL executor. The dashboard is a separate Rocket-based web application that connects to the same PostgreSQL instance and provides project management and model visualization. The SDK layer (pgml-sdks) provides language-specific clients that issue parameterized SQL queries to the extension from application code. The architecture deliberately avoids microservice decomposition in favor of database co-location as the primary architectural primitive.

Tech Stack The PostgreSQL extension is written in Rust and built with pgrx and Cargo, targeting PostgreSQL versions 12 through 17. Python interoperability is provided by PyO3 with pyo3-asyncio for async-compatible ML operations; the embedded Python environment uses Hugging Face Transformers, sentence-transformers, scikit-learn, XGBoost, LightGBM, TRL, and PEFT for fine-tuning workflows. Vector storage and approximate nearest neighbor search is handled by pgvector, accessed from Rust via the pgvector crate. The web dashboard runs on the Rocket framework with SQLx for async PostgreSQL queries, Sailfish for server-side HTML templating, and a minimal JavaScript frontend using Hotwire Turbo and Stimulus for dynamic page updates without a SPA framework. The SDK uses SQLx and Tokio for async database access, with Neon (Node.js N-API) for the JavaScript binding and cbindgen for C header generation. Deployment is Docker-based with GPU passthrough via NVIDIA Container Toolkit.

Code Quality The extension has an SQL-based integration test suite in pgml-extension/tests/test.sql that covers training, prediction, embedding, and NLP pipelines by executing SQL against a live PostgreSQL instance. GitHub Actions workflows run CI across multiple PostgreSQL versions and operating systems, with dedicated workflows for the Python SDK (pytest), JavaScript SDK, and Docker image builds. Rust code follows cargo fmt conventions enforced in CI. Error handling in the Rust extension uses the anyhow crate for error propagation with unwrap_or_error! macros that convert Rust Results into PostgreSQL error conditions surfaced to the SQL caller. The Python bindings have limited unit tests but extensive integration coverage through the SQL test suite. Type safety is strong on the Rust side due to pgrx’s typed datum system; the Python interop layer relies on runtime validation.

What Makes It Unique The defining architectural decision is running ML inference as native PostgreSQL extension functions rather than as external services, using pgrx to compile Rust code as a PostgreSQL shared library and PyO3 to embed a Python interpreter within that same process. This means a call to pgml.embed or pgml.transform executes entirely within the PostgreSQL backend process handling the query — there is no IPC, no HTTP request, and no data serialization. The consequence is that ML operations can participate in transactions, access live table data without ETL, and benefit from PostgreSQL’s own connection pooling and query planner. The approach also enables GPU utilization directly from SQL queries by passing data to CUDA-capable PyTorch backends within the embedded Python process, a pattern uncommon in both the PostgreSQL extension ecosystem and the ML serving landscape.

Self-Hosting

PostgresML is released under the MIT License, which is one of the most permissive open-source licenses available. You can use it commercially, modify the source code, distribute it, and incorporate it into proprietary systems without any copyleft obligations. There are no restrictions on the number of users, database size, or production deployments. The MIT license does not require you to open-source your own application code when you use PostgresML as a dependency.

Self-hosting PostgresML requires a Linux environment with PostgreSQL 12 or later, a Rust toolchain (with pgrx installed), and a Python 3 environment for the transformer and scikit-learn bindings. For GPU-accelerated inference, an NVIDIA GPU with CUDA drivers and the NVIDIA Container Toolkit are required. The quickest path is the official Docker image, which bundles PostgreSQL, the pgml extension, and all Python dependencies. Operationally, you are responsible for PostgreSQL availability, backups, storage scaling, and CUDA driver maintenance on the underlying host. The extension itself does not add significant operational overhead beyond what a standard PostgreSQL deployment requires, but the embedded Python environment and large model weights require careful disk and memory planning — LLMs can occupy tens of gigabytes on disk.

PostgresML offers a managed cloud service at postgresml.org with a free tier that provides GPU access and pre-loaded models without any self-hosting burden. The cloud service handles PostgreSQL version upgrades, GPU driver management, high availability, and automated backups. Self-hosters forgo these guarantees and managed support SLAs. The cloud tier also offers a serverless database option that scales to zero, which is impractical to replicate in a self-hosted setup without significant infrastructure investment. Organizations evaluating self-hosting should weigh the operational simplicity of the managed offering against data residency requirements that may mandate on-premises deployment.

Join founders buildingwith open source

Opinionated takes, migration guides, cost-saving tips, and insights from the open source ecosystem.

Subscribe on Substack

No spam. Unsubscribe anytime.

Join 750+ subscribers
No spam. Unsubscribe anytime.

Search