Skip to content
Go back

Grafana OnCall + Webhooks: Paging Without PagerDuty

By SumGuy 9 min read
Grafana OnCall + Webhooks: Paging Without PagerDuty

PagerDuty Costs $19/User. You Have One User.

Let’s be honest: PagerDuty is brilliant. It’s also $19 per user per month, and if you’re running a home lab with one person on call, that’s a coffee and a half you could spend on cloud credits instead.

Here’s the thing about alerting at home: you’ve got Prometheus firing alerts into Alertmanager, everything’s going to a Slack webhook, and when something actually breaks at 2 AM, your phone’s sitting on your desk silenced because you were tired of false positives at dinner time. By the time you check Slack, your Plex server’s been down for 20 minutes.

Grafana OnCall exists to fix this. It’s not a toy. It’s the engine that runs real PagerDuty installations. And since 2022, there’s a fully functional open-source edition you can run on a single Docker container in your home lab.

This is your guide to setting up proper on-call paging without the PagerDuty bill.


What Grafana OnCall Actually Is

OnCall is Grafana’s incident response platform. At its core: alert aggregation → escalation policies → notifications to your phone.

Here’s the mental model: Alertmanager sends a webhook to OnCall. OnCall groups similar alerts (because you don’t need 40 notifications for the same broke service). It applies your escalation policy (first, it tries the primary on-call person’s phone number; if they don’t acknowledge in 5 minutes, it pages their backup). If it’s 3 AM on Sunday, maybe you skip the escalation and go straight to SMS. OnCall handles all of that logic instead of you building it in shell scripts and cron jobs.

The OSS edition (open-sourced 2024) runs the full feature set: schedules, escalation policies, integrations, webhook receivers. What you don’t get: Grafana Cloud hosting (you self-host it), enterprise auth (no Okta/LDAP), and the mobile app is a web PWA instead of native (honestly, it’s fine).

Key fact: OnCall is purpose-built for this. If you use just ntfy + Alertmanager webhooks, you get HTTP-to-phone notifications. You don’t get acknowledgment, routing, escalation, or schedules. That’s the gap OnCall fills.


The Home Lab Reality Check

Solo home lab ops are weird. You’re the primary, secondary, and backup. You’re also the human with a day job who can’t be glued to a phone. So here’s what OnCall does for you:

1. Escalation policies (even for one person)

You’ll never actually use the third step, but it’s there. More importantly, your primary escalation is SMS to your actual phone, not a Slack message on a muted channel.

2. Schedules and overrides

3. Deduplication and grouping


How to Deploy Grafana OnCall (Docker)

Fastest path: OnCall + Postgres + Redis in a compose file.

docker-compose.yml
version: '3.8'
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: oncall
POSTGRES_USER: oncall
POSTGRES_PASSWORD: supersecretpassword
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- oncall-net
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
networks:
- oncall-net
oncall:
image: grafana/oncall:latest
depends_on:
- postgres
- redis
environment:
DATABASE_URL: postgresql://oncall:supersecretpassword@postgres:5432/oncall
REDIS_URL: redis://redis:6379/0
SECRET_KEY: ${SECRET_KEY:-your-secret-key-change-this}
GRAFANA_API_URL: http://grafana:3000
GRAFANA_API_TOKEN: ${GRAFANA_API_TOKEN}
ONCALL_BACKEND_URL: http://oncall:8080
ONCALL_FRONTEND_URL: http://oncall.yourdomain.local
ports:
- "8080:8080"
networks:
- oncall-net
volumes:
- oncall_data:/var/lib/oncall
grafana:
image: grafana/grafana:latest
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
ports:
- "3000:3000"
networks:
- oncall-net
volumes:
- grafana_data:/var/lib/grafana
volumes:
postgres_data:
redis_data:
oncall_data:
grafana_data:
networks:
oncall-net:
driver: bridge

Spin it up:

Terminal window
docker compose up -d

Visit http://localhost:8080, sign up. You’re done with deployment.

The first time you log in, create an admin user. OnCall will guide you through it.


Wiring Alertmanager to OnCall

OnCall exposes a webhook endpoint. Alertmanager sends alerts there. Here’s the Alertmanager config:

alertmanager.yml
global:
resolve_timeout: 5m
route:
receiver: 'grafana-oncall'
group_by: ['alertname', 'instance']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receivers:
- name: 'grafana-oncall'
webhook_configs:
- url: 'http://oncall:8080/api/v1/integrations/grafana/notify/'
send_resolved: true

That’s it. Prometheus fires → Alertmanager routes to OnCall → OnCall applies your escalation policy and pages you.

The webhook is unauthenticated by default (it’s your internal network). If you expose OnCall over the internet, add authentication:

webhook_configs:
- url: 'http://oncall:8080/api/v1/integrations/grafana/notify/'
http_sd_configs:
- authorization:
type: Bearer
credentials: 'your-webhook-token'

Generate a token in OnCall under Settings → API Tokens.


Setting Up Escalation Policies and Schedules

Log into OnCall. Navigate to Escalation Policies.

Create a policy:

  1. Step 1 (Primary): Notify users in on-call schedule. Wait 5 minutes.
  2. Step 2 (Fallback): Notify a Telegram group or Slack channel. Wait 15 minutes.
  3. Step 3 (Last resort): Notify integration (we’ll cover this below).

Your “on-call schedule” is just you. Add yourself, set your availability (e.g., always on, or specific days/hours).

Example escalation:


Phone Alerts Without PagerDuty: ntfy, Telegram, Pushover, Twilio

OnCall integrates with several phone-alert systems. Here’s what actually works for home labs:

Telegram (free, instant, reliable)

Setup: In OnCall, go Settings → Integrations and connect Telegram. Copy the bot token, add it to a Telegram group.

ntfy.sh (free, no account needed)

The catch: ntfy doesn’t have acknowledgment. You acknowledge in OnCall’s web UI, not from the phone. Less polished, but it works.

Pushover ($5 one-time)

Twilio SMS (costs money per SMS, ~$0.02 each)

Use Twilio for critical policies (database down, payment system down). Use Telegram/Pushover for normal stuff.


Authentication and the Oauth Headache

OnCall has OAuth setup, but it’s optional. For a home lab, just disable it:

Terminal window
docker exec oncall ./manage.py shell

Then:

from django.contrib.auth.models import User
User.objects.create_superuser('admin', '[email protected]', 'password')

You get a basic user/pass login. No Okta, no LDAP, no fancy stuff.

If you expose OnCall to the internet, consider:

Honestly, most home labs run OnCall on the local network only, behind a firewall. The auth is a non-issue because your mom isn’t trying to log in to your monitoring stack.


Webhook Receivers and Custom Integrations

OnCall can ingest webhooks from anywhere. If you have a custom monitoring tool, a cron job, or a Kubernetes operator that fires alerts, you can point them at OnCall’s webhook.

Generic webhook receiver:

  1. In OnCall, create a new integration: Settings → Integrations → Webhooks
  2. Copy the webhook URL.
  3. POST a JSON payload:
{
"title": "Database backup failed",
"description": "Automated backup on backup-01 failed: permission denied",
"severity": "critical",
"status": "firing"
}

OnCall ingests it, applies your escalation policy. You get paged.

Real example: Healthchecks.io failure webhook

Healthchecks.io (dead-man’s-switch monitoring) can POST to a webhook when a check fails. Point that to OnCall. Your cron job doesn’t report in? OnCall pages you.


Mobile App Reality

The OSS edition doesn’t have a native mobile app. You get:

The native Grafana OnCall app exists for the Cloud edition. For self-hosted OSS, the PWA is actually decent. You can acknowledge incidents without leaving the app.


When OnCall is Overkill

If all you care about is “wake up the human,” OnCall might be excessive. You could just:

  1. Alertmanager + ntfy: Prometheus → Alertmanager → ntfy webhook → app notification. Done in 5 minutes. Free. Scales to 10 people (one ntfy topic, everyone subscribes).

  2. Healthchecks.io + Telegram: Monitor your services, get Telegram pings if they fail. No escalation, no schedules, just instant notification.

  3. Raw Slack + custom escalation: A Lambda function watches Slack messages, escalates to SMS after 5 minutes. Works, but you’re writing code.

Use OnCall if you have:

Use ntfy if you have:


The Missing Piece: Mobile Push from OnCall

OnCall’s OSS edition doesn’t push directly to iOS/Android. You need an intermediary: Telegram bot, Pushover client, ntfy app. This is a design choice (Grafana Cloud pays for push service costs). For home labs, it’s fine. Telegram is instant and free.


When You Need Real Paging

If you run infrastructure for a team (a small SaaS, a side project with users), real paging matters:

That’s OnCall’s home. For a solo home lab running Docker and Kubernetes for fun, OnCall is luxurious. But the luxury is cheap: free software, one container, your own hardware.

The alternative—PagerDuty at $19/month or $228/year—buys you a hosted service and a native app. If you’re profitable, pay it. If you’re self-hosting because you like the craft (and the coffee savings), OnCall is the move.

Deploy it, point your alerts at it, and sleep better knowing your home lab will actually wake you up when it matters.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
iperf3 + nload: Network Diagnosis

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts