PagerDuty Costs $19/User. You Have One User.
Let’s be honest: PagerDuty is brilliant. It’s also $19 per user per month, and if you’re running a home lab with one person on call, that’s a coffee and a half you could spend on cloud credits instead.
Here’s the thing about alerting at home: you’ve got Prometheus firing alerts into Alertmanager, everything’s going to a Slack webhook, and when something actually breaks at 2 AM, your phone’s sitting on your desk silenced because you were tired of false positives at dinner time. By the time you check Slack, your Plex server’s been down for 20 minutes.
Grafana OnCall exists to fix this. It’s not a toy. It’s the engine that runs real PagerDuty installations. And since 2022, there’s a fully functional open-source edition you can run on a single Docker container in your home lab.
This is your guide to setting up proper on-call paging without the PagerDuty bill.
What Grafana OnCall Actually Is
OnCall is Grafana’s incident response platform. At its core: alert aggregation → escalation policies → notifications to your phone.
Here’s the mental model: Alertmanager sends a webhook to OnCall. OnCall groups similar alerts (because you don’t need 40 notifications for the same broke service). It applies your escalation policy (first, it tries the primary on-call person’s phone number; if they don’t acknowledge in 5 minutes, it pages their backup). If it’s 3 AM on Sunday, maybe you skip the escalation and go straight to SMS. OnCall handles all of that logic instead of you building it in shell scripts and cron jobs.
The OSS edition (open-sourced 2024) runs the full feature set: schedules, escalation policies, integrations, webhook receivers. What you don’t get: Grafana Cloud hosting (you self-host it), enterprise auth (no Okta/LDAP), and the mobile app is a web PWA instead of native (honestly, it’s fine).
Key fact: OnCall is purpose-built for this. If you use just ntfy + Alertmanager webhooks, you get HTTP-to-phone notifications. You don’t get acknowledgment, routing, escalation, or schedules. That’s the gap OnCall fills.
The Home Lab Reality Check
Solo home lab ops are weird. You’re the primary, secondary, and backup. You’re also the human with a day job who can’t be glued to a phone. So here’s what OnCall does for you:
1. Escalation policies (even for one person)
- Primary: try SMS to your main number, 5-minute timeout
- Fallback: try your backup phone (partner’s, parent’s, a friend’s)
- Ultimate fallback: escalate to a Telegram group or Slack channel for visibility
You’ll never actually use the third step, but it’s there. More importantly, your primary escalation is SMS to your actual phone, not a Slack message on a muted channel.
2. Schedules and overrides
- You’re on-call Mon–Fri 9–5. Someone else (or no one) is on-call nights and weekends.
- You’re on vacation next week; override the schedule to point to your backup.
- OnCall handles the scheduling logic. You don’t need to hack Alertmanager configs.
3. Deduplication and grouping
- 50 Prometheus alerts fire for “database-pool-exhausted.” OnCall groups them into one incident, sends one notification. You acknowledge once, they all resolve.
How to Deploy Grafana OnCall (Docker)
Fastest path: OnCall + Postgres + Redis in a compose file.
version: '3.8'
services: postgres: image: postgres:16-alpine environment: POSTGRES_DB: oncall POSTGRES_USER: oncall POSTGRES_PASSWORD: supersecretpassword volumes: - postgres_data:/var/lib/postgresql/data networks: - oncall-net
redis: image: redis:7-alpine volumes: - redis_data:/data networks: - oncall-net
oncall: image: grafana/oncall:latest depends_on: - postgres - redis environment: DATABASE_URL: postgresql://oncall:supersecretpassword@postgres:5432/oncall REDIS_URL: redis://redis:6379/0 SECRET_KEY: ${SECRET_KEY:-your-secret-key-change-this} GRAFANA_API_URL: http://grafana:3000 GRAFANA_API_TOKEN: ${GRAFANA_API_TOKEN} ONCALL_BACKEND_URL: http://oncall:8080 ONCALL_FRONTEND_URL: http://oncall.yourdomain.local ports: - "8080:8080" networks: - oncall-net volumes: - oncall_data:/var/lib/oncall
grafana: image: grafana/grafana:latest environment: GF_SECURITY_ADMIN_PASSWORD: admin ports: - "3000:3000" networks: - oncall-net volumes: - grafana_data:/var/lib/grafana
volumes: postgres_data: redis_data: oncall_data: grafana_data:
networks: oncall-net: driver: bridgeSpin it up:
docker compose up -dVisit http://localhost:8080, sign up. You’re done with deployment.
The first time you log in, create an admin user. OnCall will guide you through it.
Wiring Alertmanager to OnCall
OnCall exposes a webhook endpoint. Alertmanager sends alerts there. Here’s the Alertmanager config:
global: resolve_timeout: 5m
route: receiver: 'grafana-oncall' group_by: ['alertname', 'instance'] group_wait: 10s group_interval: 10s repeat_interval: 12h
receivers: - name: 'grafana-oncall' webhook_configs: - url: 'http://oncall:8080/api/v1/integrations/grafana/notify/' send_resolved: trueThat’s it. Prometheus fires → Alertmanager routes to OnCall → OnCall applies your escalation policy and pages you.
The webhook is unauthenticated by default (it’s your internal network). If you expose OnCall over the internet, add authentication:
webhook_configs: - url: 'http://oncall:8080/api/v1/integrations/grafana/notify/' http_sd_configs: - authorization: type: Bearer credentials: 'your-webhook-token'Generate a token in OnCall under Settings → API Tokens.
Setting Up Escalation Policies and Schedules
Log into OnCall. Navigate to Escalation Policies.
Create a policy:
- Step 1 (Primary): Notify users in on-call schedule. Wait 5 minutes.
- Step 2 (Fallback): Notify a Telegram group or Slack channel. Wait 15 minutes.
- Step 3 (Last resort): Notify integration (we’ll cover this below).
Your “on-call schedule” is just you. Add yourself, set your availability (e.g., always on, or specific days/hours).
Example escalation:
- 5 min: SMS to +1-555-0100 (your phone, via Twilio)
- 10 min: Telegram to your personal group chat
- 20 min: Slack #incidents channel (so someone else notices)
Phone Alerts Without PagerDuty: ntfy, Telegram, Pushover, Twilio
OnCall integrates with several phone-alert systems. Here’s what actually works for home labs:
Telegram (free, instant, reliable)
- Add OnCall’s Telegram bot to a private chat.
- In OnCall, create a Telegram notification channel.
- Use it in your escalation policy.
- When an alert fires, you get a Telegram message. Tap “Acknowledge” in the message, incident’s resolved.
Setup: In OnCall, go Settings → Integrations and connect Telegram. Copy the bot token, add it to a Telegram group.
ntfy.sh (free, no account needed)
- OnCall can send HTTP POST to any webhook.
- Create a custom integration that sends to
https://ntfy.sh/your-topic. - Install the ntfy app on your phone, subscribe to
your-topic. - Alerts arrive as phone notifications.
The catch: ntfy doesn’t have acknowledgment. You acknowledge in OnCall’s web UI, not from the phone. Less polished, but it works.
Pushover ($5 one-time)
- One-time payment, iOS and Android apps, dead simple.
- OnCall supports Pushover natively.
- Set it in your escalation policy. Done.
Twilio SMS (costs money per SMS, ~$0.02 each)
- Real phone call or SMS, not app-based.
- Works even if your phone is dead silent or off.
- OnCall + Twilio: set it up under Settings → Integrations.
- SMS is blunt: no acknowledgment, no escalation logic, just the message.
Use Twilio for critical policies (database down, payment system down). Use Telegram/Pushover for normal stuff.
Authentication and the Oauth Headache
OnCall has OAuth setup, but it’s optional. For a home lab, just disable it:
docker exec oncall ./manage.py shellThen:
from django.contrib.auth.models import UserYou get a basic user/pass login. No Okta, no LDAP, no fancy stuff.
If you expose OnCall to the internet, consider:
- Running it behind Caddy or Nginx with HTTP Basic Auth or Authelia
- Using a VPN (WireGuard) to access it
- Reverse-proxy with OAuth2-proxy
Honestly, most home labs run OnCall on the local network only, behind a firewall. The auth is a non-issue because your mom isn’t trying to log in to your monitoring stack.
Webhook Receivers and Custom Integrations
OnCall can ingest webhooks from anywhere. If you have a custom monitoring tool, a cron job, or a Kubernetes operator that fires alerts, you can point them at OnCall’s webhook.
Generic webhook receiver:
- In OnCall, create a new integration: Settings → Integrations → Webhooks
- Copy the webhook URL.
- POST a JSON payload:
{ "title": "Database backup failed", "description": "Automated backup on backup-01 failed: permission denied", "severity": "critical", "status": "firing"}OnCall ingests it, applies your escalation policy. You get paged.
Real example: Healthchecks.io failure webhook
Healthchecks.io (dead-man’s-switch monitoring) can POST to a webhook when a check fails. Point that to OnCall. Your cron job doesn’t report in? OnCall pages you.
Mobile App Reality
The OSS edition doesn’t have a native mobile app. You get:
- A PWA (web app you can add to your home screen)
- Push notifications via your Telegram/Pushover/ntfy app
- Acknowledge/resolve in the PWA
The native Grafana OnCall app exists for the Cloud edition. For self-hosted OSS, the PWA is actually decent. You can acknowledge incidents without leaving the app.
When OnCall is Overkill
If all you care about is “wake up the human,” OnCall might be excessive. You could just:
-
Alertmanager + ntfy: Prometheus → Alertmanager → ntfy webhook → app notification. Done in 5 minutes. Free. Scales to 10 people (one ntfy topic, everyone subscribes).
-
Healthchecks.io + Telegram: Monitor your services, get Telegram pings if they fail. No escalation, no schedules, just instant notification.
-
Raw Slack + custom escalation: A Lambda function watches Slack messages, escalates to SMS after 5 minutes. Works, but you’re writing code.
Use OnCall if you have:
- Multiple people on-call at different times (schedules matter)
- Different escalation rules for different severity levels
- Complex integrations (Grafana alerts → OnCall → Twilio → your team lead’s phone)
Use ntfy if you have:
- One person always on-call
- No acknowledgment needed (“send me alerts, I’ll check the app eventually”)
- Simplicity over features
The Missing Piece: Mobile Push from OnCall
OnCall’s OSS edition doesn’t push directly to iOS/Android. You need an intermediary: Telegram bot, Pushover client, ntfy app. This is a design choice (Grafana Cloud pays for push service costs). For home labs, it’s fine. Telegram is instant and free.
When You Need Real Paging
If you run infrastructure for a team (a small SaaS, a side project with users), real paging matters:
- Someone needs to own each incident.
- Escalation can’t wait for you to check Slack.
- Acknowledgment proves someone’s actually handling it.
- Schedules rotate people fairly.
That’s OnCall’s home. For a solo home lab running Docker and Kubernetes for fun, OnCall is luxurious. But the luxury is cheap: free software, one container, your own hardware.
The alternative—PagerDuty at $19/month or $228/year—buys you a hosted service and a native app. If you’re profitable, pay it. If you’re self-hosting because you like the craft (and the coffee savings), OnCall is the move.
Deploy it, point your alerts at it, and sleep better knowing your home lab will actually wake you up when it matters.