Speakr is a self-hosted web application designed to transcribe audio recordings into organized, searchable notes using AI. Built for privacy-conscious individuals and teams, it runs entirely on your infrastructure to ensure sensitive conversations remain confidential. Unlike cloud-based transcription services, Speakr gives you full control over your data while offering advanced features like speaker identification, intelligent tagging, and automated retention policies. It’s ideal for researchers, legal teams, podcasters, educators, and anyone who needs to turn voice recordings into structured, actionable insights without relying on third-party platforms.
The platform supports multiple transcription engines—including OpenAI’s gpt-4o-transcribe-diarize and self-hosted WhisperX—and provides a rich interface for interactive analysis, semantic search, and integration with external tools like Obsidian and Logseq. With features like tag stacking, group-based sharing, and incognito mode, Speakr adapts to both personal use cases like family memories and enterprise workflows such as legal compliance or sales call analysis.
What You Get
- AI Transcription with Speaker Diarization - Accurately transcribes audio with speaker identification using either OpenAI’s gpt-4o-transcribe-diarize or self-hosted WhisperX ASR with GPU support.
- Voice Profiles - AI-powered speaker recognition that learns and identifies voices across recordings, requiring WhisperX ASR with embedding support.
- Smart Tagging & Prompt Stacking - Apply custom AI prompts to tags (e.g., ‘Recipe’ or ‘Code Review’) to transform raw transcripts into formatted outputs like step-by-step instructions or action items; stack multiple tags for layered transformations.
- Audio-Transcript Sync - Click any part of the transcript to jump to that moment in the audio, with auto-highlighting and follow mode for hands-free playback.
- Interactive Chat & Inquire Mode - Ask natural language questions about your recordings to get AI-powered summaries or answers without re-listening.
- REST API v1 - Full-featured API with Swagger UI for automation via n8n, Zapier, or Make; includes /api/v1/upload endpoint for programmatic uploads.
- Single Sign-On (OIDC) - Authenticate using Keycloak, Azure AD, Google, Auth0, or any OIDC-compatible identity provider.
- Group Management & Granular Permissions - Create groups with shared access, set view/edit/reshare permissions, and auto-share recordings via group tags.
- Public Sharing with Admin Control - Generate secure, shareable links to recordings while retaining administrative oversight over external access.
- Auto-Deletion & Retention Policies - Set custom retention periods per group or tag (e.g., 7-year legal compliance, 14-day standup cleanup) with optional tag protection.
- Incognito Mode - Process transcriptions without storing them in the database, enabling ephemeral use cases with no data persistence.
- Obsidian/Logseq Auto-Export - Automatically write transcripts to your note-taking system using custom templates with variables like {{ai_title}} and {{date}}.
- Transcription Usage Tracking - Monitor per-user transcription minutes, estimated costs, and set monthly limits with warnings and blocks at 80% and 100% thresholds.
- Custom Title Templates - Define formatting rules using variables ({{filename}}, {{date}}) to auto-generate titles; skip AI calls when no {{ai_title}} is used.
- Multi-Select Batch Operations - Select multiple recordings to delete, tag, reprocess, or toggle inbox/highlight status in bulk.
- Playback Speed Control - Adjust audio playback from 0.5x to 3x with persistent user preferences across sessions.
Common Use Cases
- Building a legal compliance system for client consultations - Use group tags with 7-year retention policies and view-only sharing to ensure recordings are preserved for audit purposes while preventing accidental edits.
- Creating a research archive of interview transcripts - Apply ‘Protected’ tags and Obsidian auto-export to preserve audio and structured notes indefinitely, enabling longitudinal analysis.
- Automating meeting summaries for engineering teams - Tag recordings with ‘Code Review’ to extract action items and technical suggestions directly into team wikis or issue trackers.
- Managing family memories across generations - Create a ‘Family’ group with auto-sharing and no deletion policy so all members can access recordings of events without manual curation.
- Streamlining sales call analysis for remote teams - Use ‘Client Meeting’ tags with 1-year retention and AI-powered summaries to help reps learn from past conversations without manual note-taking.
- Transcribing lecture recordings for students - Apply ‘Lecture’ + ‘Biology 301’ tags to generate study notes with key concepts and definitions, reducing manual review time.
- Running a podcast production workflow - Use SRT subtitle templates and auto-export to generate captions for video uploads directly from audio recordings.
- DevOps teams managing transcription pipelines - Deploy WhisperX ASR with GPU support and configure OpenAI fallbacks to balance cost, accuracy, and privacy in a hybrid architecture.
Under The Hood
Speakr is a modular, Flask-based platform designed for audio transcription and summarization, offering flexible integration with multiple ASR services and LLMs. It provides a scalable architecture that supports extensible transcription pipelines and robust middleware for authentication and rate limiting.
Architecture
Speakr adopts a monolithic structure with well-defined layers and modules, enabling clear separation of concerns across API handling, database interactions, and service logic.
- The system uses a service-oriented architecture to manage transcription and LLM workflows
- A strategy pattern is implemented in transcription connectors for flexible service selection
- Middleware components handle authentication and rate limiting, ensuring secure and controlled access
- Configuration-driven design allows for extensive customization without code changes
Tech Stack
The platform is built on Python with Flask as its primary web framework, supporting a wide range of external integrations.
- Core backend is powered by Flask for API routing and request handling
- Extensive use of third-party transcription and LLM services through configurable connectors
- Modular design enables easy addition of new providers or services
- Type annotations and structured directory layout support long-term maintainability
Code Quality
Code quality in Speakr reflects a mature development approach with strong testing and consistent error handling practices.
- Comprehensive test coverage includes API endpoints, edge cases, and transcription workflows
- Error handling is consistently applied with appropriate fallbacks and exception types
- Code style remains largely consistent, though some legacy integrations show signs of technical debt
- Well-organized structure and documentation support ongoing development and onboarding
What Makes It Unique
Speakr distinguishes itself through its extensible transcription pipeline and seamless integration of multiple services.
- The platform supports a variety of transcription providers through a unified connector interface
- A flexible job queue system enables asynchronous processing and scalability
- Customizable configuration via environment variables allows deployment flexibility across environments
- Modular design enables easy expansion with new transcription or summarization services