Gemma Gem

Name: Gemma Gem
Rating: 5 (950 reviews)

A Chrome extension running Google's Gemma 4 model entirely on-device via WebGPU — a browser AI assistant that reads pages, clicks buttons, fills forms, and runs JavaScript with no API keys or cloud dependency.

950stars

103forks

Apache License 2.0

TypeScript

View Source

On This Page

Gemma Gem is a browser-native AI agent that runs entirely on-device: Google’s Gemma 4 model executes via WebGPU inside a Chrome offscreen document, with no API key, account, or cloud service involved. Once loaded (and cached after first run), it can read the content of any page you visit, take screenshots, execute JavaScript, and interact with the DOM directly — clicking elements, filling forms, scrolling — driven by natural-language requests in a chat overlay.

The extension’s architecture splits responsibility across three browser contexts: an offscreen document hosts the model and the agent’s reasoning loop via @huggingface/transformers and WebGPU, a service worker routes messages between contexts and handles screenshot/JavaScript-execution requests, and a content script injects the chat UI (in a shadow DOM to avoid clashing with page styles) and executes DOM-manipulation tools directly on the page.

Apache-2.0 licensed, Gemma Gem requires Chrome with WebGPU support and roughly 500MB-1.5GB of disk space for the cached model (depending on the E2B or E4B variant used), but otherwise runs with zero external dependencies once installed.

What You Get

On-device Gemma 4 inference via WebGPU inside the browser, with no API key or cloud dependency
Page-reading and DOM interaction tools: click elements, type text, scroll, and read page content
Screenshot capture and JavaScript execution as agent tools, run from the service worker
A shadow-DOM chat overlay injected into any page, so the assistant works on whatever site you’re viewing

Common Use Cases

Getting quick, privacy-preserving answers about the content of a page without sending it to a cloud AI service
Automating simple browser interactions (clicking, form-filling, scrolling) through natural-language requests
Running an AI assistant in offline or air-gapped browsing contexts where no external API calls are possible
Experimenting with on-device WebGPU LLM inference in a real browser extension rather than a bare demo

Under The Hood

Architecture Gemma Gem splits work across three Chrome extension contexts by design: the offscreen document hosts the model and the agent’s reasoning loop (via @huggingface/transformers and WebGPU), the service worker acts as a message router between contexts and handles screenshot/JavaScript-execution requests, and the content script injects a shadow-DOM chat UI plus DOM tools (read_page_content, click_element, type_text, scroll_page) that run directly against the page. This three-context split follows Chrome’s extension architecture constraints — WebGPU inference needs the offscreen document, while DOM manipulation needs the content script — rather than trying to force everything into one context.

Tech Stack TypeScript, @huggingface/transformers for running Gemma 4 via WebGPU entirely client-side, and Chrome’s Manifest V3 extension APIs (offscreen documents, service workers, content scripts) for the browser integration. Built with pnpm.

Code Quality As a young, single-purpose project, formal test coverage wasn’t the focus observed; the README’s explicit three-context architecture diagram and tool table indicate deliberate design around Chrome’s extension constraints rather than an ad hoc implementation.

What Makes It Unique Most browser AI assistants proxy requests to a cloud LLM; Gemma Gem instead runs the full model on-device via WebGPU, meaning page content and actions never leave the browser — a meaningfully different privacy and offline-capability trade-off than typical cloud-backed browser copilots.