Gemma Multimodal Fine-Tuner

Name: Gemma Multimodal Fine-Tuner
Rating: 5 (1480 reviews)

An Apple-Silicon-native LoRA fine-tuning tool for Gemma on text, image, and audio data — with a wizard CLI, live browser-based training visualizer, and streaming from GCS/BigQuery for datasets too large for local disk.

1.5Kstars

102forks

MIT License

Python

View Source

On This Page

Gemma Multimodal Fine-Tuner fills a specific gap: most LoRA fine-tuning tools for Gemma (MLX-LM, Unsloth, axolotl) handle text-only CSV well, but support image+text or audio+text fine-tuning unevenly or not at all, and typically assume an NVIDIA GPU. This tool runs natively on Apple Silicon via MPS (no CUDA required) and supports text-only, image+text (captioning/VQA), and audio+text LoRA fine-tuning from local CSV data.

For datasets too large to fit on a local SSD, it can stream training data directly from Google Cloud Storage or BigQuery rather than requiring everything downloaded first. A wizard CLI walks through system checks, LoRA configuration, model selection, and dataset setup, and a real-time training visualizer — loss curve, attention heatmap, gradient signal strength, memory pressure, and token-by-token predictions — runs live in the browser during training with a single config flag, no TensorBoard or notebook setup required.

MIT licensed and open source, it’s positioned specifically for Mac-based ML practitioners who want multimodal Gemma fine-tuning without provisioning cloud GPU infrastructure.

What You Get

LoRA fine-tuning for Gemma on text-only, image+text (captioning/VQA), and audio+text data
Native Apple Silicon (MPS) execution with no NVIDIA GPU or CUDA requirement
Streaming training data from Google Cloud Storage or BigQuery for datasets larger than local disk
A real-time, browser-based training visualizer showing loss, attention, gradients, and memory live

Common Use Cases

Fine-tuning Gemma for image captioning or visual question answering on a Mac without cloud GPU access
Fine-tuning Gemma on audio+text pairs natively on Apple Silicon instead of requiring an NVIDIA machine
Training on datasets too large for local storage by streaming directly from GCS or BigQuery
Monitoring training progress visually in real time instead of parsing raw log output or configuring TensorBoard

Under The Hood

Architecture The tool centers on an MPS-native (Apple Silicon) training pipeline for LoRA fine-tuning, with distinct data paths for text-only, image+text, and audio+text modalities rather than treating multimodal support as an afterthought bolted onto a text-only trainer. The streaming-from-GCS/BigQuery capability decouples dataset size from local disk capacity, and the training visualizer runs as a browser-based frontend fed live data from the training loop via a single config flag — meaning observability was designed in rather than requiring a separate TensorBoard setup.

Tech Stack Python, built on Apple’s MPS backend for Apple Silicon GPU acceleration (no CUDA path), with LoRA as the fine-tuning method and integrations for streaming data from Google Cloud Storage and BigQuery. The wizard CLI handles system checks, LoRA config, model selection, and dataset setup interactively.

Code Quality The README includes a direct feature-comparison table against MLX-LM, Unsloth, and axolotl, indicating the author benchmarked against established alternatives rather than building in isolation; active, consistently maintained commit history reflects ongoing investment in a young project.

What Makes It Unique Most Gemma fine-tuning tools are text-first with image/audio support as an afterthought (or missing entirely) and assume NVIDIA hardware; this tool specifically targets Apple Silicon with genuine multimodal (text/image/audio) LoRA support and a built-in live visualizer, filling a gap the comparison table in its own README makes explicit.

Self-Hosting

Licensing Model MIT licensed — fully open source with no license key.

Self-Hosting Restrictions Not applicable; it’s a local training tool run on your own Mac, with optional GCS/BigQuery streaming using your own cloud credentials.

License Key Required No.