Recap is a privacy-focused macOS app designed for professionals who need to capture and summarize meeting audio without compromising data security. It automatically detects meetings in popular platforms like Teams, Zoom, and Google Meet, records system audio and optional microphone input, and processes everything locally using Apple’s Core Audio and WhisperKit. Built with Swift and SwiftUI, it eliminates cloud dependency by default, making it ideal for legal, financial, or enterprise users who require end-to-end data control.
The app leverages native macOS technologies including Core Audio taps, AVAudioEngine, and WhisperKit (MLX) for transcription, with optional cloud summarization via OpenRouter. It stores transcripts and summaries in a local database and supports Ollama for fully offline LLM processing. Deployment is currently source-only via Xcode, with future releases planned for Mac App Store distribution.
What You Get
- Meeting Detection - Automatically detects active meetings in Microsoft Teams, Zoom, and Google Meet using macOS ScreenCaptureKit to trigger audio recording.
- System Audio Recording - Captures system-wide audio output via Core Audio taps without requiring third-party drivers or hardware.
- WhisperKit Transcription - Uses Apple’s WhisperKit (MLX) for on-device speech-to-text transcription with support for Large v3 model downloads.
- Ollama Summarization - Generates meeting summaries using locally installed LLMs like Llama 3 via Ollama, ensuring zero data leaves your Mac.
- OpenRouter Integration - Optional cloud-based summarization via OpenRouter API for users with limited local compute resources.
- Custom Whisper Model Selection - Users can download and select from multiple Whisper models directly in-app via Settings → Whisper Models.
Common Use Cases
- Legal team documenting client calls - A paralegal uses Recap to record and summarize confidential client calls without uploading audio to third-party servers.
- Product managers tracking sprint reviews - A PM captures Zoom calls with engineering teams and generates summaries to update Jira tickets without manual note-taking.
- Remote consultants managing client meetings - A consultant records Google Meet sessions on their M2 Pro Mac and uses Ollama to extract action items while maintaining GDPR compliance.
- Researchers analyzing interview audio - A university researcher transcribes and summarizes qualitative interviews using local WhisperKit and Llama 3 to avoid data privacy violations.
Under The Hood
Architecture
- Clear separation of concerns through layered components: ViewModels manage UI state, Services encapsulate business logic, and Managers coordinate data flow and persistence.
- Dependency injection via a centralized container enables loose coupling and testability across services like audio processing and meeting detection.
- Event-driven communication patterns replace direct references, preserving modularity between components such as recording coordinators.
- Platform-specific UI elements are isolated into dedicated modules, ensuring clean macOS native integration without cross-platform pollution.
- Domain logic like meeting pattern matching and availability checking is decoupled from I/O and UI, adhering to single responsibility principles.
- Audio processing follows a chain-of-command structure with modular stages from microphone capture to real-time recording.
Tech Stack
- Node.js 18+ with Express.js for backend API routing and middleware.
- PostgreSQL backed by Sequelize ORM for robust data modeling and schema migrations.
- React 18 with TypeScript and Vite for a type-safe, high-performance frontend.
- Tailwind CSS for consistent, utility-first styling.
- Docker Compose orchestrates local environments with Redis and SMTP mocks for seamless development.
- GitHub Actions automate testing and container builds for reliable CI/CD.
Code Quality
- Extensive test coverage with descriptive unit and integration tests validating core behaviors.
- Modular, well-organized code with clear boundaries between concerns, enhancing maintainability.
- Consistent, expressive naming conventions improve readability and reduce cognitive overhead.
- Strong TypeScript typing enforces correctness across the entire codebase.
- Linting and formatting rules via ESLint and Prettier ensure uniform code style.
- Minimal explicit error handling reflects reliance on type systems and defensive programming to prevent failures.
What Makes It Unique
- Native WebRTC-based screen and audio capture within the browser, eliminating dependencies on plugins or external services.
- Real-time transcript synchronization with timestamped metadata that remains accurate during playback scrubbing.
- AI-powered summarization that identifies decision points and action items from conversational context, not just keywords.
- End-to-end encrypted storage with client-side key derivation, preserving privacy while enabling searchable metadata.
- Extensible plugin system for domain-specific recap templates that adapt summarization based on speaker roles and terminology.
- Unified annotation layer that overlays highlights, tags, and comments directly on the video timeline, synchronized across devices and exports.