GPU Hot is a lightweight, web-based dashboard for real-time monitoring of NVIDIA GPUs without requiring SSH access. Built with Python, Flask, and Docker, it leverages the NVIDIA Management Library (NVML) to collect detailed GPU metrics including utilization, temperature, memory usage, power draw, and process-level details. Designed for DevOps and MLOps teams managing GPU workloads, it eliminates the need to log into individual servers by providing a centralized web interface. Whether you’re monitoring one GPU or scaling across 100+ nodes, GPU Hot offers a unified view with live charts and an API for integration into existing monitoring pipelines.
What You Get
- Real-time GPU metrics - Collects sub-second updates for utilization, temperature, memory usage, power draw, fan speed, clock speeds, PCIe bandwidth, P-State, throttle status, and encoder/decoder sessions using NVML.
- Process-level monitoring - Tracks running processes on GPUs with PID, memory consumption, and name visibility when run with —init —pid=host flags.
- Multi-node cluster support - Aggregates metrics from multiple GPU servers into a single dashboard using hub mode with NODE_URLS environment variable.
- Web-based interface - No SSH needed; access metrics via browser at
http://localhost:1312 with interactive charts and live updates.
- System-wide metrics - Monitors host CPU and RAM usage alongside GPU data for holistic system insight.
- REST and WebSocket APIs - Exposes real-time GPU data via /api/gpu-data (JSON) and WebSocket events for custom dashboard integrations.
- Docker-first deployment - Pre-built Docker images and docker-compose.yml enable quick setup with NVIDIA Container Toolkit integration.
Common Use Cases
- Building a multi-GPU MLOps training cluster - Monitoring hundreds of GPUs across distributed nodes during deep learning training jobs to detect thermal throttling or memory leaks in real time.
- DevOps teams managing AI inference servers - Observing GPU utilization and process activity on production inference hosts to optimize resource allocation and detect rogue processes.
- Problem: SSH access restricted in cloud environments → Solution: GPU Hot - Organizations with security policies blocking SSH can still monitor NVIDIA GPUs via a web dashboard deployed inside Docker containers.
- Team: Research labs with shared GPU resources - Multiple users accessing the same GPU server can see live usage by process ID, preventing conflicts and enabling fair resource sharing.
Under The Hood
GPU Hot is a real-time GPU monitoring solution designed for NVIDIA GPUs, supporting both single-node and multi-node (hub) configurations. It provides live telemetry through a web interface, leveraging asynchronous processing and modular architecture to ensure scalability and performance in GPU-intensive environments.
Architecture
The system adopts a layered architecture that separates core monitoring logic from configuration and UI components, enabling modularity and reusability.
- The architecture uses a layered design with distinct modules for configuration, metrics collection, and WebSocket handling to ensure loose coupling.
- Strategy-based fallback mechanisms allow for graceful degradation between NVML and nvidia-smi monitoring methods.
- Event-driven architecture through WebSockets enables real-time updates and supports distributed cluster telemetry in hub mode.
Tech Stack
The project is built using Python and modern web technologies to deliver real-time GPU telemetry with a responsive UI.
- Built primarily with Python and FastAPI, integrating NVIDIA’s GPU monitoring tools and JavaScript for frontend interactivity.
- Relies on key libraries such as pynvml, psutil, and aiohttp for GPU data collection, system metrics, and asynchronous operations.
- Docker is used for containerization with multi-stage builds and health checks to support stable runtime environments.
Code Quality
The codebase shows a balanced approach to testing and error handling, though some inconsistencies and technical debt are present.
- Comprehensive test coverage includes Docker-based load testing with realistic GPU simulation patterns and multi-node configurations.
- Extensive error handling is implemented via try/except blocks, though some broad exception catching limits diagnostic clarity.
- Code consistency varies with mixed naming conventions and duplication in configuration handling between modules.
- Technical debt is evident in the form of hardcoded values, limited input validation in mock nodes, and unclear separation between monitoring and aggregation logic.
What Makes It Unique
GPU Hot stands out with its dual support for local and distributed GPU monitoring modes, offering flexibility in deployment and telemetry.
- It uniquely supports both single-node and multi-node (hub) monitoring, enabling cluster-wide GPU telemetry in real time.
- The integration of fallback strategies between NVML and nvidia-smi enhances resilience in different GPU environments.
- Real-time updates via asynchronous WebSockets provide a performant and interactive user experience for GPU monitoring.