Handy is a privacy-focused, offline speech-to-text tool designed for users who need reliable transcription without cloud uploads or subscriptions. Built for accessibility, it targets individuals with mobility impairments, writers, developers, and anyone seeking control over their voice data. It works by capturing audio via system microphone, filtering silence with Silero VAD, transcribing using local Whisper or Parakeet models, and pasting the result into the active text field.
Handy is built with Tauri, combining a React + TypeScript frontend with a Rust backend for high-performance audio processing and system integration. It leverages whisper-rs and transcription-rs for ML inference, cpal for cross-platform audio, and rdev for global hotkeys. Deployable on Windows, macOS, and Linux, it supports GPU acceleration and offers CLI control, Raycast integration, and extensibility via open-source code.
What You Get
- Offline Speech-to-Text - Transcribes speech using local Whisper or Parakeet V3 models without sending audio to the cloud, ensuring complete privacy.
- Direct Text Pasting - Automatically pastes transcribed text into the currently focused application, eliminating manual copy-paste steps.
- Whisper & Parakeet Model Support - Choose between high-accuracy Whisper (Small/Medium/Turbo/Large) or CPU-optimized Parakeet V3 with automatic language detection.
- Global Keyboard Shortcuts - Configure custom keybindings (e.g., Ctrl+Z) to start/stop transcription from any app, with support for macOS Globe key and Wayland/X11 systems.
- Push-to-Talk Mode - Hold a key to record and release to transcribe, ideal for quick, interruptible dictation without toggling.
- Cross-Platform Support - Runs natively on Windows (x64), macOS (Intel/Apple Silicon), and Linux (X11/Wayland) with platform-specific input tools like xdotool, wtype, or dotool.
- CLI Remote Control - Control Handy from terminal with flags like —toggle-transcription, —start-hidden, and —debug for automation and headless use.
- Raycast Integration - Control transcription, view history, and manage models directly from Raycast’s macOS launcher via official extension.
Common Use Cases
- Writing with a broken hand - A user with a cast uses Handy to dictate documents without typing, relying on offline transcription to maintain productivity.
- Privacy-conscious journalists - Reporters transcribe interviews in real-time without uploading audio to third-party servers, ensuring source confidentiality.
- Developers building voice interfaces - Engineers fork Handy to add custom models or integrate with home automation systems using its extensible Rust backend.
- Linux users on Wayland - Users with limited speech-to-text options leverage Handy’s wtype/dotool support to enable voice input in tiling window managers like Sway or Hyprland.
Under The Hood
Architecture
- Clean, component-based React/Next.js frontend with decoupled UI elements composed via props, enabling high reuse across settings and model workflows
- Centralized state management via Zustand with dedicated stores and hooks that eliminate prop drilling and enforce predictable updates
- Tauri backend integrated through type-safe plugin APIs that abstract native system operations, ensuring clear separation between UI and platform-specific logic
- Modular file structure with path aliases and scoped components that promote logical grouping and reduce cognitive overhead
- Tailwind CSS with design tokens and platform-aware theming that maintains visual consistency while respecting OS-native UI conventions
Tech Stack
- React 18 and Next.js with TypeScript forming a robust, type-safe frontend foundation, enhanced by Vite for rapid development and React Server Components for optimized rendering
- Tauri 2.x as the native runtime, bridging Rust system capabilities with TypeScript frontend via plugins for file system, SQL, and OS interactions
- SQLite for local persistence, paired with Zustand and Immer to enable reactive, immutable state updates
- Playwright for end-to-end testing and Tauri CLI for cross-platform packaging, ensuring reliable desktop deployment
- Comprehensive tooling including ESLint, Prettier, i18next, and TypeScript 5.x with optimized module resolution for maintainable, internationalized code
Code Quality
- Strong TypeScript usage with comprehensive interfaces and type guards that enforce correctness in state and system bindings
- Consistent, domain-aligned naming conventions that improve readability and reduce mental mapping
- Defensive programming patterns with null checks and fallbacks ensure stability, though granular error classification is limited
- Clean component architecture with encapsulated logic and styling, promoting maintainability and testability
- Extensive linting and formatting discipline suggests strict tooling configuration, contributing to code uniformity and reliability
What Makes It Unique
- Seamless integration with macOS accessibility APIs for native permission handling, eliminating external dependencies and enhancing user trust
- Dynamic GPU detection and auto-populated accelerator dropdowns empower users to leverage hardware capabilities without manual configuration
- Unified audio feedback system with customizable themes and real-time previews creates an immersive, tactile experience rare in transcription tools
- Platform-aware theming that renders native scrollbars on macOS and custom ones elsewhere, delivering OS-native fidelity without third-party libraries
- Model-aware settings that dynamically adapt UI options based on loaded model capabilities, ensuring context-sensitive and future-proof configuration