Stirling PDF is a self-hostable, open-core PDF platform that enables users to edit, convert, and automate PDF workflows entirely on their own infrastructure. Designed for developers, enterprises, and privacy-conscious users, it solves the critical problem of sensitive document exposure by eliminating third-party cloud processing. With 74k+ GitHub stars and 25M+ downloads, it’s the most trusted PDF solution for organizations requiring data sovereignty.
Built with TypeScript and Docker-first architecture, Stirling PDF offers a web UI, desktop app, and REST APIs. It integrates with local LLMs for AI-native processing and supports deployment on bare metal, private clouds, or air-gapped environments. The platform includes enterprise features like SSO, audit logs, and multi-language UI support across 40+ languages.
What You Get
- 50+ PDF Tools - Edit, merge, split, sign, redact, compress, convert to/from images, extract text, and apply OCR — all in one unified interface with no external dependencies.
- AI-Native Processing - Run local neural networks and LLMs to summarize, redact, and extract data from PDFs without sending documents to the cloud.
- Private API Endpoints - Access all PDF operations via REST APIs for integration into existing systems, enabling automation of batch processing and document pipelines.
- Enterprise SSO & Audit Logs - Enforce role-based access control, integrate with SAML/OAuth providers, and track all user actions for compliance and security audits.
- Air-Gapped Deployment - Run 100% offline using Docker containers; no internet connection required, ideal for military, government, and healthcare environments.
- Multi-Language UI - Full interface localization in 40+ languages to support global teams and non-English-speaking users.
Common Use Cases
- Military & Defense Document Processing - Intelligence agencies process classified PDFs in air-gapped environments without risking data leaks.
- Healthcare HIPAA Compliance - Hospitals automate redaction of patient identifiers from medical records while maintaining full data control.
- Legal e-Discovery Automation - Law firms accelerate contract review and document analysis using local AI to extract clauses and summarize case files.
- Financial KYC Workflows - Banks automate loan application processing by extracting and validating identity documents with zero data leaving the internal network.
Under The Hood
Architecture
- Clear separation of concerns between Java/Spring backend and React frontend, connected via well-defined REST APIs
- Layered backend structure with dedicated packages for controllers, services, and repositories, promoting modularity and maintainability
- PDF processing logic encapsulated in reusable utility classes using strategy-like patterns for extensibility
- Dependency injection via Spring annotations ensures loose coupling and testability across core components
- Frontend components are designed as reusable, typed modules with clear prop interfaces, enabling consistent UI composition
Tech Stack
- Java backend powered by Spring Boot and Gradle, organized into modular core and proprietary components
- Hybrid JVM/Python toolchain with Python scripts integrated for auxiliary tasks and Ruff for code formatting
- React frontend with Mantine UI, i18next for internationalization, and pdfjs-legacy for high-fidelity PDF rendering
- Static assets and backend services designed for containerized deployment, with comprehensive CI/CD pipelines
- Automated quality enforcement via GitHub Actions, pre-commit hooks, and security scanning tools
Code Quality
- Extensive test coverage spanning unit, integration, and end-to-end workflows with dynamic endpoint discovery
- Clean, modular code organization with hooks encapsulating tool-specific state and reusable utilities
- Robust error handling with validation, fallbacks, and graceful degradation for edge cases
- Consistent, intent-driven naming conventions across components, hooks, and tests
- Strong TypeScript typing applied to state, props, and responses, enhancing reliability and developer experience
What Makes It Unique
- Native PDF.js integration with legacy cmaps and bidirectional text support enables accurate rendering of complex scripts without external dependencies
- Dynamic TSA configuration via JSON presets allows enterprise-grade timestamping with zero code changes for new CAs
- SaaS billing and usage metrics are seamlessly woven into the UI through a centralized connection service with real-time updates
- Innovative tooltip system with dual interaction modes, viewport-aware positioning, and sidebar-specific rendering while maintaining accessibility
- License compliance enforced at the dependency level via automated whitelist validation against OSI-approved licenses