Skip to content
Go back

Healthchecks.io Self-Hosted: Cron Monitoring

By SumGuy 9 min read
Healthchecks.io Self-Hosted: Cron Monitoring

Your Backup Cron Failed Silently. You’ll Find Out in Six Months.

Here’s the thing: exit codes don’t email you. Your monitoring stack doesn’t know if a job didn’t run. It only sees what you explicitly tell it about. And that’s where 99% of backup failures hide.

You’ve got restic or borgmatic set up, right? Running every night at 2 AM via cron. It fails three times in a row due to network flakiness, but there’s no dashboard screaming about it. You keep sleeping. By the time you notice (usually when you need to restore), six months of “backups” are actually just corruption logs.

This is the dead-man-switch problem. Not “something went wrong” — but “something didn’t happen at all.”

Healthchecks.io solves this. The self-hosted version runs on your own hardware, integrates with everything from restic to systemd timers, and sends you an alert the second a periodic job fails to check in.

What Is a Dead-Man-Switch?

Picture this: you’re piloting a plane. There’s a button in your hand. While you’re conscious and holding it, all is well. The moment you fall asleep (or worse), your grip loosens. The button releases. Alarm sounds.

That’s a dead-man-switch. In monitoring terms: I expect you to ping me every 24 hours. If you don’t, something’s wrong.

It’s the inverse of traditional alerting:

Cron jobs are the poster child for this problem because they have no stdout, no metrics, no Prometheus scrape endpoint. They just… run (or don’t). Your monitoring won’t know the difference.

How Healthchecks.io Works

You create a “check” — essentially a URL with a grace period and a schedule. Your cron job (or systemd timer, or Kubernetes CronJob) POSTs to that URL after it finishes. Healthchecks watches the URL.

If the ping shows up on time? Green. Late? Yellow. Missing? Red. Alert fires.

You can also use start and fail signals:

The Healthchecks dashboard shows you exactly when each job last pinged, how long it took, and whether it’s healthy or alarming. It’s not flashy, but it’s useful.

Why Self-Hosted?

Healthchecks.io has a free SaaS tier. It’s good. But:

Self-hosted Healthchecks runs on Docker, uses PostgreSQL (or SQLite for tiny setups), and sends alerts through your channels: email, Slack, ntfy.sh, webhook, Telegram, PagerDuty, whatever. Total control. And it’s dead simple to deploy.

Docker Compose Setup

Here’s a working stack (PostgreSQL + Healthchecks + Caddy reverse proxy):

docker-compose.yml
version: '3.8'
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: healthchecks
POSTGRES_USER: healthchecks
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- healthchecks
restart: unless-stopped
healthchecks:
image: healthchecks/healthchecks:latest
environment:
DEBUG: "False"
ALLOWED_HOSTS: "checks.example.com"
SECRET_KEY: ${SECRET_KEY}
DB: postgresql
DB_HOST: postgres
DB_USER: healthchecks
DB_PASSWORD: ${DB_PASSWORD}
DB_NAME: healthchecks
EMAIL_HOST: smtp.example.com
EMAIL_PORT: 587
EMAIL_HOST_USER: [email protected]
EMAIL_HOST_PASSWORD: ${SMTP_PASSWORD}
EMAIL_USE_TLS: "True"
DEFAULT_FROM_EMAIL: [email protected]
SITE_NAME: "Healthchecks"
SITE_ROOT: "https://checks.example.com"
ports:
- "8000:8000"
depends_on:
- postgres
networks:
- healthchecks
restart: unless-stopped
volumes:
- healthchecks_data:/opt/healthchecks
caddy:
image: caddy:latest
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
networks:
- healthchecks
restart: unless-stopped
volumes:
postgres_data:
healthchecks_data:
caddy_data:
networks:
healthchecks:

Caddyfile for reverse proxy and HTTPS:

Caddyfile
checks.example.com {
reverse_proxy healthchecks:8000
encode gzip
}

Spin it up:

Terminal window
# Generate secrets (keep these safe)
export SECRET_KEY=$(openssl rand -base64 32)
export DB_PASSWORD=$(openssl rand -base64 32)
export SMTP_PASSWORD="your-smtp-password"
docker-compose up -d

Visit https://checks.example.com, create an account, log in. You’re done.

Integrating with Your Backups

Let’s say you’ve got a restic backup script that runs nightly. You create a check in Healthchecks (grab the URL from the dashboard — looks like https://checks.example.com/ping/abc123def456/).

After your backup finishes, curl that URL:

backup.sh
#!/bin/bash
set -e
# ... your restic backup commands here ...
restic -r s3://bucket/backup backup /home/user/important-stuff
# Ping Healthchecks to say "I finished successfully"
curl -m 10 --retry 5 https://checks.example.com/ping/abc123def456/
# If you want to catch errors:
if [ $? -eq 0 ]; then
echo "Backup and ping succeeded"
else
# Notify Healthchecks of failure
curl -m 10 https://checks.example.com/ping/abc123def456/fail
exit 1
fi

In crontab:

# Run at 2 AM every day
0 2 * * * /opt/backup.sh >> /var/log/backup.log 2>&1

That’s it. Healthchecks now knows whether your backup ran, whether it succeeded, and exactly when. If the script doesn’t run for 26+ hours, you get an email.

Borgmatic and Rclone Examples

If you’re using borgmatic (which abstracts Borg backup):

/etc/borgmatic/config.yaml
hooks:
after_backup:
- curl --silent --show-error --max-time 10 \
https://checks.example.com/ping/abc123def456/
on_error:
- curl --silent --show-error --max-time 10 \
https://checks.example.com/ping/abc123def456/fail

For rclone sync jobs (replicating to cloud):

#!/bin/bash
rclone sync /local/photos gdrive:/backup/photos --delete-during
# Only ping if rclone succeeded
if [ $? -eq 0 ]; then
curl https://checks.example.com/ping/sync-photos-uuid/
else
curl https://checks.example.com/ping/sync-photos-uuid/fail
fi

Schedule and Grace Syntax

When you create a check, you define its expected cadence using standard cron syntax (or friendly names):

The grace period is how late you’ll tolerate before alarming. Set it generously enough for network jitter and occasional slowness, but tight enough to catch real problems:

The timeout (how long Healthchecks waits for a ping after the scheduled time) is separate. If your job runs at 2:00 AM and doesn’t finish until 4:30 AM, you want the grace period to cover that. Timeout is your safety net: “If I still haven’t heard by 4:30 AM + timeout, send the alert.”

Alert Channels

Once a check goes red, Healthchecks can notify you via:

Set up a Slack channel #monitoring and route all backup alerts there. Bonus: the alert includes a link back to the Healthchecks dashboard with the exact ping history.

Complementary, Not Replacement

Healthchecks is a dead-man-switch, not a full monitoring system. It answers one question: “Did this job run?”

It doesn’t:

But it’s perfect at what it does: catching the silent failures that traditional monitoring misses. Use it alongside Prometheus for the complete picture.

The Cron + Alertmanager Integration

If you’re already running Prometheus + Alertmanager, you can wire Healthchecks checks into Alertmanager webhooks as a receiver.

Create a custom integration that fires a webhook on check failure:

healthchecks-webhook.yaml
integrations:
- name: alertmanager
webhook_url: http://alertmanager:9093/api/v1/alerts

When Healthchecks detects a failure, it POSTs an alert to Alertmanager, which routes it alongside your other alerts. Now your on-call dashboard treats a missed backup the same as a failing API endpoint.

Maintenance Windows and Pausing

Sometimes you need to take a server offline for maintenance. If you unpause a check without disabling it first, Healthchecks will alarm the moment the grace period expires.

The dashboard has a “Pause” button on each check. Use it:

  1. Click “Pause” before you shut down for maintenance.
  2. Do your work.
  3. Come back and manually resume the check, or let it auto-unpause.

Pro tip: Healthchecks can auto-pause for a fixed window if you set it up, but for one-off maintenance, the manual button is clearer.

Comparing to Alternatives

Cronitor (SaaS): Feature-rich, beautiful dashboard, but you’re paying per check and your data lives with them.

Custom Prometheus blackbox exporter: You could run blackbox probes against these URLs and scrape the results into Prometheus. Overkill for simple cron monitoring, but flexible if you’re already heavy on Prometheus.

Systemd notify: Built into systemd timers, but only notifies systemd-journald, not external systems. Useful locally, not sufficient for distributed alerting.

Dead simple: logger + log shipping: Pipe cron output to syslog, ship to Loki, alert on missing logs. Works, but requires more infrastructure.

Healthchecks wins on simplicity and purpose-built design. It’s the HTTP ping philosophy: minimal overhead, maximum clarity.

The Crons That Need Watching

Every periodic job that matters deserves a check:

Create a Healthchecks check for each. Green dashboard = peace of mind. Red dashboard = you know exactly what’s broken before it bites you.

Your backups are worthless if you don’t know they’re running. Healthchecks makes sure you do.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
Argo Workflows vs Tekton

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts