Healthchecks is an open-source monitoring tool built with Python and Django that ensures your cron jobs, backups, data pipelines, and scheduled tasks run as expected. It detects failures by tracking pings from your jobs and sends alerts when they’re late or missing, preventing silent failures in critical operations. Ideal for DevOps engineers, system administrators, and developers managing automated workflows.
The system uses PostgreSQL, MySQL, or MariaDB for storage, Django 6.0 for the backend, and supports deployment via Docker or direct installation. It offers a web dashboard, REST API, SMTP listener for email-based pings, and 25+ notification integrations including Slack, Discord, PagerDuty, and Webhooks. Self-hosting is fully supported with environment-based configuration and Django admin access.
What You Get
- Live-Updating Dashboard - Real-time status view of all cron jobs with color-coded states (Up, Late, Down) and tag-based organization for easy monitoring.
- Configurable Period & Grace Time - Define expected ping intervals and grace periods to avoid false alerts during scheduled delays, using simple time values or cron expressions via cronsim library.
- 25+ Notification Integrations - Send alerts via email, Slack, Discord, Microsoft Teams, PagerDuty, Opsgenie, Telegram, SMS, WhatsApp, Pushover, ntfy, and more with one-click setup.
- Public Status Badges - Generate hard-to-guess PNG/SVG status badges for checks or tags to embed in READMEs, dashboards, or status pages without authentication.
- Cron Expression Support - Use standard cron syntax (e.g., “0 2 * * 1”) to define precise scheduling windows for pings, parsed by the cronsim library.
- Email and HTTP Ping Endpoints - Receive pings via HTTP GET/POST requests or SMTP emails to monitor any system that can make a request or send mail, including legacy scripts.
- WebAuthn 2FA Support - Enable FIDO2/WebAuthn two-factor authentication for secure access to the dashboard without relying on TOTP apps.
- Team Management & Read-Only Access - Create projects, invite team members, and assign read-only permissions for monitoring without edit access.
- Django Admin Panel - Full administrative control to manage users, delete accounts, adjust ping log limits, and inspect database tables directly.
- SMTP Listener for Email Pings - Run ./manage.py smtpd to accept ping emails, enabling monitoring of systems without HTTP access (e.g., legacy scripts or air-gapped servers).
- Monthly/Weekly Reports & Reminders - Automate email summaries of job health, including failed runs and uptime trends, triggered via sendreports command.
- Database Cleanup Tools - Automated pruning of old pings, token buckets, stale users, and external storage objects to maintain performance and reduce storage costs.
Common Use Cases
- Monitoring nightly database backups - A DevOps engineer uses Healthchecks to ensure PostgreSQL backups run every night; if the backup script fails or hangs, they receive a Slack alert within minutes.
- Tracking data pipeline health in ETL workflows - A data analyst sets up Healthchecks to ping after each ETL job in a Python script; if a job doesn’t complete by 3 AM, they get a PagerDuty alert to investigate.
- Validating SSL certificate renewals - A sysadmin configures a cron job to ping Healthchecks after certbot renews certificates; if the ping fails, they’re notified before the cert expires.
- Monitoring Jenkins and GitHub Actions workflows - A CI/CD team adds a curl command to the end of their pipelines to ping Healthchecks; failed builds trigger SMS alerts to on-call engineers.
- Embedding system health in public dashboards - A SaaS company uses Healthchecks status badges in their public status page to show real-time uptime of background services like email queues and data syncs.
- Running monitoring on air-gapped systems - A government IT team sends ping emails from an internal server to Healthchecks’ SMTP listener, enabling monitoring without outbound HTTP access.
Under The Hood
Architecture
- Django-based MVC structure with clear separation between models, views, and services, enforcing modular boundaries and reusable components
- Service layer encapsulates external integrations and business logic, allowing consistent consumption by both API endpoints and command-line interfaces
- Dependency injection via Django’s built-in resolution and test doubles enables isolated unit testing and predictable behavior
- Type safety is reinforced through mypy-django-plugin, ensuring model and query correctness across the codebase
- Modular test suites mirror production components, enabling comprehensive coverage without redundancy
Tech Stack
- Django with django-stubs-ext and mypy_django_plugin for type-safe development at scale
- Python 3.8+ with comprehensive type hints and static analysis for robust error prevention
- MySQL and SQLite with Django ORM migrations for reliable data persistence
- Django REST Framework for APIs, paired with a minimal HTML/CSS administrative interface
- aiosmtpd for asynchronous email handling and statsd for real-time metrics instrumentation
- CI/CD and deployment practices inferred from configuration and tooling, indicating production readiness
Code Quality
- Extensive test coverage spanning unit, integration, and end-to-end scenarios with realistic HTTP flows and state validation
- Robust error handling with structured exceptions, detailed logging, and graceful degradation for OAuth and API failures
- Consistent naming, modular organization, and clear separation of concerns across accounts, integrations, and core systems
- Strong type annotations throughout the codebase enhance maintainability and IDE support
- Effective mocking and patching isolate external dependencies, ensuring fast, deterministic test execution
- Configuration-driven testing with environment overrides enables validation across diverse deployment states
What Makes It Unique
- Multi-tenant architecture with project-scoped checks and channels provides seamless team isolation without separate instances
- Flexible ping endpoint with configurable timeout and grace periods supports diverse monitoring use cases
- Unified notification system for multiple channels with built-in deduplication and failure recovery
- Automatic alert suppression during maintenance windows eliminates manual intervention
- Lightweight, stateless ping API optimized for high throughput and resource-constrained environments
- Granular permission system tied to project roles enables fine-grained access without full authentication infrastructure