Pingdom Sells Pings. You Already Have Prometheus.
You’re sitting in Prometheus. You’ve got nodes exporting their guts — CPU, memory, disk, all the internal metrics you can eat. But here’s the thing: what happens when your API is up and humming on the inside, but nobody on the internet can reach it? Internal metrics won’t catch that. You need someone standing outside your firewall, actually pinging your services, checking if they’re dead to the world.
That’s what Blackbox Exporter does. It’s Prometheus’s synthetic monitoring leg — a standalone tool that probes your services from the outside (or from different vantage points) and reports back success or failure. No SaaS fees. No Pingdom. No Datadog uptime monitoring tab. Just a Go binary and a config file.
If you’re paying $30/month for Uptime Kuma to alert you when your Nextcloud falls over, or you’ve got Pingdom bleeding $200/year just to know when DNS fails, stop. Blackbox Exporter does this for free. And if you’re already running Prometheus + Alertmanager, you’ve already solved the hard part.
What Blackbox Actually Does
Blackbox Exporter is not an agent. It doesn’t run on your target. It runs somewhere with a clear view outbound — your monitoring station, a VPS, a Pi in another timezone, whatever. From that vantage point, it:
- HTTP/HTTPS probes — hits a URL, checks for response code, TLS cert validity, response time
- TCP connect checks — does port 443 answer?
- DNS lookups — does
example.comresolve? Are you getting split-horizon DNS results across networks? - ICMP (ping) — old-school
pingto see if a host is alive (spoiler: requires root orCAP_NET_RAW)
Every probe result becomes a Prometheus metric: probe_success (0 or 1), probe_duration_seconds, probe_http_status_code, probe_tls_cert_not_after, etc. Feed that into Alertmanager, get a Slack message when things break. Done.
The Modules: What You’re Actually Probing
Blackbox comes with pre-built probe definitions called modules. Think of them as profiles for different kinds of checks:
modules: http_2xx: prober: http timeout: 5s http: valid_status_codes: [200, 201]
http_post_2xx: prober: http timeout: 5s http: method: POST body: '{"key": "value"}' valid_status_codes: [200]
tcp_connect: prober: tcp timeout: 5s
dns: prober: dns timeout: 5s dns: preferred_ip_protocol: "ip4" query_name: "example.com"
icmp: prober: icmp timeout: 5s icmp: preferred_ip_protocol: "ip4"The common pattern: define what you’re checking (HTTP response codes, TCP port, DNS record, ICMP), set a timeout (don’t wait forever for dead services), and let Prometheus scrape the results.
Installing and Running Blackbox
Download the binary from the Prometheus releases page, or use your distro’s package manager:
# Debian/Ubuntusudo apt install prometheus-blackbox-exporter
# Or download manuallywget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gztar xzf blackbox_exporter-0.25.0.linux-amd64.tar.gzsudo mv blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter /usr/local/bin/Create a config file (blackbox.yml):
modules: http_2xx: prober: http timeout: 10s http: valid_status_codes: [200, 201, 202, 204] follow_redirects: true preferred_ip_protocol: "ip4"
http_post_2xx: prober: http timeout: 10s http: method: POST valid_status_codes: [200, 201] preferred_ip_protocol: "ip4"
tcp_connect: prober: tcp timeout: 5s tcp: preferred_ip_protocol: "ip4"
dns_lookup: prober: dns timeout: 5s dns: preferred_ip_protocol: "ip4" query_name: "example.com"
icmp_ping: prober: icmp timeout: 5s icmp: preferred_ip_protocol: "ip4"Run it:
blackbox_exporter --config.file=blackbox.ymlBy default, it listens on http://localhost:9115/metrics. Prometheus scrapes that. Done.
Wiring It Into Prometheus
Here’s the trick: Blackbox is one exporter, but you’re probing many targets. You need to pass the target and module to it via URL parameters. In Prometheus config:
scrape_configs: - job_name: "blackbox-http" metrics_path: /probe params: module: [http_2xx] static_configs: - targets: - https://sumguy.com - https://example.com - https://api.myapp.local relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: "localhost:9115"
- job_name: "blackbox-tcp" metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: - "api.myapp.local:443" - "db.myapp.local:5432" relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: "localhost:9115"
- job_name: "blackbox-dns" metrics_path: /probe params: module: [dns_lookup] static_configs: - targets: - "example.com" - "myapp.local" relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: "localhost:9115"The relabel_configs magic here is critical: it takes your target list, passes each one as __param_target to Blackbox, then sets the instance label to the actual target (not localhost:9115). This way your alerts show “https://sumguy.com is down” instead of “blackbox exporter host is down.”
Real-World Example: Certificate Expiry Alerts
One of the most useful Blackbox probes: catching TLS certificate expiry before your customers notice. Every HTTP probe includes probe_tls_cert_not_after — a Unix timestamp of when the cert expires.
Add this alert rule:
groups: - name: blackbox_alerts rules: - alert: SSLCertificateExpiring expr: | (probe_tls_cert_not_after - time()) / 86400 < 14 for: 1h annotations: summary: "SSL cert for {{ $labels.instance }} expires in {{ humanize (($value | int) + 1) }} days" description: "Certificate expires in {{ humanize (($value | int) + 1) }} days. Renew now."
- alert: HTTPProbeDown expr: probe_success == 0 for: 2m annotations: summary: "{{ $labels.instance }} is unreachable" description: "Probe to {{ $labels.instance }} has failed for 2 minutes. Check logs."
- alert: HTTPProbeHighLatency expr: probe_duration_seconds > 2 for: 5m annotations: summary: "{{ $labels.instance }} is slow" description: "{{ $labels.instance }} responding in {{ $value }}s — check performance."Bam. Now you get notified 14 days before your cert dies, instead of having your API go dark at 3 AM on a Friday.
DNS Probes for Split-Horizon Shenanigans
Maybe you’ve got internal DNS returning a private IP and external DNS returning a public IP (classic setup for self-hosting). Blackbox can check both:
scrape_configs: - job_name: "blackbox-dns-internal" metrics_path: /probe params: module: [dns_lookup_internal] static_configs: - targets: - "myapp.internal" relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: "monitoring-box.internal:9115"
- job_name: "blackbox-dns-external" metrics_path: /probe params: module: [dns_lookup_external] static_configs: - targets: - "myapp.com" relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: "vps-in-aws:9115"Run one Blackbox instance inside your network, one on a VPS, both scraping the same domains. You’ll catch DNS misconfigs immediately.
ICMP Pings and the CAP_NET_RAW Headache
Ping (ICMP) is useful for “is this host even alive?” checks. But it requires elevated permissions:
# Option 1: Run blackbox as root (bad)sudo blackbox_exporter --config.file=blackbox.yml
# Option 2: Grant CAP_NET_RAW (better)sudo setcap cap_net_raw=ep /usr/local/bin/blackbox_exporterblackbox_exporter --config.file=blackbox.yml
# Option 3: Run in a container with --cap-add=NET_RAWdocker run --cap-add=NET_RAW -p 9115:9115 \ -v /path/to/blackbox.yml:/etc/blackbox_exporter/blackbox.yml \ prom/blackbox-exporter:latestHonestly, Option 2 (setcap) is the cleanest for a bare-metal setup. If you’re running Docker, add --cap-add=NET_RAW to your compose file.
Synthetic Monitoring Across Multiple Sites
Here’s where Blackbox gets spicy: run instances in different places. Monitoring box in your home lab? Run another one on a DigitalOcean droplet or Hetzner box. Both scrape your production API. Now you catch:
- Local network failures (home lab → API)
- ISP issues (does your internet actually work?)
- Geolocation-based failures (maybe your CDN is broken in one region)
- DNS propagation issues (nameservers disagree)
All from one Prometheus instance. Each Blackbox exporter tags metrics with its own hostname or datacenter label — you’ll see exactly which vantage point failed.
scrape_configs: - job_name: "blackbox-global" honor_timestamps: false static_configs: - targets: [https://myapp.com, https://api.myapp.com] labels: site: "homelab" - targets: [https://myapp.com, https://api.myapp.com] labels: site: "aws"When site: "aws" fails but site: "homelab" passes, you know it’s AWS’s problem, not yours.
Blackbox vs. Uptime Kuma: When Do You Use Which?
Uptime Kuma is a dashboard. Pretty UI, simple setup, good for “show the boss our uptime.” But:
- No native Prometheus integration (unless you scrape its metrics endpoint)
- Alerts go to Discord/Slack directly (no Alertmanager routing)
- No easy way to do multi-site probing
- Another service to maintain
Blackbox Exporter is plumbing. Config-as-code, tight Prometheus integration, routes through your alerting rules. But:
- No built-in dashboard (use Grafana instead)
- Config is YAML, not UI clicks
- Requires Prometheus already running
Real talk: If you’ve got Prometheus and Alertmanager, use Blackbox. If you want a standalone, low-touch “show me the uptime” dashboard and don’t have Prometheus yet, Uptime Kuma is faster to spin up. But they’re not mutually exclusive — some teams run both (Blackbox for deep integration, Kuma for executive status page).
Gotchas and Tuning
Timeout mismatch: If your Blackbox timeout is 5s but Prometheus scrape timeout is 10s, you’ll get timeout errors on the exporter’s side before Prometheus even blinks. Set Blackbox timeouts lower than scrape timeout:
# In prometheus.ymlscrape_interval: 30sscrape_timeout: 15s
# In blackbox.ymltimeout: 10sIPv6 by default: On some systems, Blackbox prefers IPv6. If your targets don’t support IPv6, add preferred_ip_protocol: "ip4" to every module:
http_2xx: prober: http timeout: 10s http: preferred_ip_protocol: "ip4"Redirect loops: If your HTTP module follows redirects by default and you’ve got a broken redirect chain, the probe will timeout. Explicitly set follow_redirects: false if you want to check the first response only.
TLS verification: By default, Blackbox verifies TLS certificates. If you’re probing internal services with self-signed certs, add:
http_2xx: prober: http http: tls_config: insecure_skip_verify: true(Yeah, it’s insecure. But it’s better than not knowing your internal API is down.)
Useful PromQL Queries
# Recent probe success rate (last 5 minutes)rate(probe_success[5m])
# Probe duration distributionhistogram_quantile(0.95, probe_duration_seconds)
# Which targets are currently down?probe_success == 0
# Certificate expiry in days(probe_tls_cert_not_after - time()) / 86400
# Target downtime in the last hourcount_values("value", increase(probe_success[1h])) == 0The Probes Worth Setting Up Today
If you’re running Prometheus in a home lab or small production, start with:
- HTTP/HTTPS checks on your public-facing services — catch outages early
- TLS cert expiry alerts — 14-day warnings save Fridays
- DNS checks across internal and external resolvers — catch split-brain disasters
- TCP port checks on critical services — database, cache, queue
Skip ICMP ping unless you specifically need to know “is this host alive?” If your services respond to HTTP, HTTP probes are better.
And yes, you can run all of this on the same Pi that’s running Prometheus. One Blackbox instance, a 10-line config, and you’re done. That’s how you replace Pingdom without spending a dime.
Your 2 AM self will appreciate not having to pay for monitoring. SaaS uptime checks smell like rent extraction anyway.