Give your LLM agents on-the-job training—ART lets you apply GRPO reinforcement learning to any multi-step agentic workflow with minimal code changes.
ART (Agent Reinforcement Trainer) is an open-source Python framework by OpenPipe that brings reinforcement learning directly into multi-step LLM agent workflows. Rather than requiring practitioners to become RL researchers, ART wraps the complexity of GRPO training behind an ergonomic client/server architecture that slots into any existing Python application. You define your agent workflow, assign rewards at the end of each rollout, and ART handles the rest—gradient updates, LoRA checkpointing, and inference routing through vLLM.
The framework ships two execution modes: a local backend for teams with their own GPU infrastructure, and a serverless backend powered by W&B Training, which manages GPU provisioning, inference clusters, and checkpoint publishing automatically. Both expose an identical API so switching between them requires only a one-line backend swap. Out of the box, ART supports Qwen 3, Llama, Mistral, and any model compatible with vLLM and Unsloth.
Beyond vanilla GRPO, ART introduces RULER—a zero-labeled-data reward function that uses an LLM judge to relatively rank trajectories within each training group, removing the need for handcrafted reward engineering for most tasks. AutoRL extends this further by generating training inputs automatically, enabling training on tasks where no dataset exists at all. Integrations with LangGraph, MCP servers, W&B, Langfuse, and OpenPipe’s own observability platform provide production-grade debugging and monitoring throughout the training loop.
Since its release in March 2025 ART has accumulated over 10,000 GitHub stars and 58 releases, driven by demonstrated results—a Qwen 2.5 14B email-retrieval agent trained with ART outperformed OpenAI’s o3 on the benchmark the team designed.
Architecture ART is organized around a strict client/server boundary: the client library runs inside the user’s agent process and exposes an OpenAI-compatible chat completions proxy that records every system, user, and assistant message into typed Trajectory objects, while the server runs as a separate GPU-bearing process that owns the vLLM inference engine and the GRPO training loop. This separation means the agent code never directly touches the model weights or training state—it simply accumulates trajectories and assigns a scalar reward, then calls train() to hand off a batch. The Backend protocol abstraction allows the server to be either a local subprocess or a remote managed service (W&B Training) with zero changes to the agent code, making the architecture genuinely portable across infrastructure tiers.
Tech Stack The client library is pure Python 3.12 with OpenAI SDK, LiteLLM (for multi-provider LLM judge calls in RULER), Polars for trajectory data manipulation, and Pydantic for all data models. The backend server depends on PyTorch, the HuggingFace Transformers and PEFT stacks, TRL for GRPO loss computation, Unsloth for memory-efficient LoRA training, and vLLM for high-throughput inference. Megatron-LM support is available as an optional extra for multi-node training at scale. Build tooling uses uv with hatchling, ruff for linting, and ty for type checking. The project ships as the openpipe-art PyPI package with optional dependency groups separating client-only from full backend installs.
Code Quality ART demonstrates comprehensive testing with an extensive unit test suite covering trajectory manipulation, metrics taxonomy, GRPO preprocessing, LoRA weight merging, LangGraph integration, MCP client behavior, RULER metrics, and cost tracking—plus integration tests that exercise the local backend training pipeline end to end. The codebase uses full type annotations throughout, enforced by ty (Astral’s type checker) running in CI alongside ruff for linting and formatting. Error handling is explicit and typed via custom exception classes with structured tracebacks captured in PydanticException models. The Backend interface is defined as a Protocol, ensuring that LocalBackend and ServerlessBackend remain structurally compatible without inheritance coupling.
What Makes It Unique ART’s most distinctive technical contribution is making GRPO a drop-in capability for existing Python agent code rather than a research experiment requiring specialized infrastructure knowledge. RULER removes the labeled-data bottleneck that has historically made RL for agents impractical for most teams: by using relative LLM-judge ranking within trajectory groups, it exploits the same insight GRPO relies on (relative rewards within a group suffice) without requiring any reference answers. AutoRL compounds this by eliminating the training data requirement entirely. The serverless backend’s multiplexed shared-cluster design delivers economics previously unavailable to individual researchers or small teams, while the LangGraph and MCP integrations mean ART works with the agent frameworks developers are already using rather than requiring a rewrite.
ART is released under the Apache License 2.0, which is one of the most permissive open-source licenses available. You can use it commercially, modify it, distribute it, and embed it in proprietary products without any obligation to open-source your own code. There are no copyleft implications—your reward functions, agent workflows, and trained LoRA adapters remain entirely your intellectual property. The only requirements are that you preserve the copyright notice and license text when redistributing the library itself.
Running ART locally demands meaningful GPU resources: training even a 7B model with GRPO requires at least a single A100 or H100-class GPU for the backend server, and the vLLM inference engine needs additional VRAM for hot-swapping LoRA adapters between rollout and training phases. You are responsible for provisioning and maintaining these GPUs, managing vLLM process health, storing checkpoints durably, and handling CUDA errors or OOM conditions that arise from large batch sizes or long trajectories. The recommended setup uses Docker with NVIDIA container runtime, and the project provides SkyPilot configuration for spinning up cloud VMs, but the operational responsibility for uptime, scaling, and cost control rests entirely with you.
The W&B Training serverless backend (operated by Weights & Biases) offloads all of that operational burden: GPU provisioning, inference cluster health, checkpoint storage, and HA are managed for you. Pricing is consumption-based and benchmarked at 40% lower cost than running equivalent dedicated GPU instances, with 28% faster training due to multiplexed shared clusters. Every checkpoint is automatically published to W&B and available for immediate inference via W&B Inference. Teams evaluating self-hosting versus managed should weigh the full operational cost of GPU maintenance, monitoring, and on-call rotation against the serverless pricing—for most teams without existing ML infrastructure, the serverless path will have lower total cost of ownership despite higher nominal per-token pricing.
No Code Platforms · AI Development · Developer Tools
Visual LLM workflow platform with RAG pipelines, agent capabilities, and model management for building production AI applications.
AI Code Assistants · AI Development
Orchestrate an army of AI coding agents—Claude Code, Codex, Gemini CLI, and more—running simultaneously in isolated git worktrees from a single Electron desktop app.
AI Code Assistants · AI Development
The self-hosted developer control center for running AI coding agents — locally, in Docker, on VMs, or across cloud backends — with automation workflows for GitHub, Slack, and more.