Blackbox Exporter: HTTP/TCP/DNS/ICMP Probes

Pingdom Sells Pings. You Already Have Prometheus.

You’re sitting in Prometheus. You’ve got nodes exporting their guts, CPU, memory, disk, all the internal metrics you can eat. But what happens when your API is up and humming on the inside, but nobody on the internet can reach it? Internal metrics won’t catch that. You need someone standing outside your firewall, actually pinging your services, checking if they’re dead to the world.

That’s what Blackbox Exporter does. It’s Prometheus’s synthetic monitoring leg, a standalone tool that probes your services from the outside (or from different vantage points) and reports back success or failure. No SaaS fees. No Pingdom. No Datadog uptime monitoring tab. Just a Go binary and a config file.

If you’re paying $30/month for Uptime Kuma to alert you when your Nextcloud falls over, or you’ve got Pingdom bleeding $200/year just to know when DNS fails, stop. Blackbox Exporter does this for free. And if you’re already running Prometheus + Alertmanager, you’ve already solved the hard part.

What Blackbox Actually Does

Blackbox Exporter is not an agent. It doesn’t run on your target. It runs somewhere with a clear view outbound, your monitoring station, a VPS, a Pi in another timezone, whatever. From that vantage point, it:

HTTP/HTTPS probes: hits a URL, checks for response code, TLS cert validity, response time
TCP connect checks: does port 443 answer?
DNS lookups: does example.com resolve? Are you getting split-horizon DNS results across networks?
ICMP (ping): old-school ping to see if a host is alive (spoiler: requires root or CAP_NET_RAW)

Every probe result becomes a Prometheus metric: probe_success (0 or 1), probe_duration_seconds, probe_http_status_code, probe_tls_cert_not_after, etc. Feed that into Alertmanager, get a Slack message when things break. Done.

The Modules: What You’re Actually Probing

Blackbox comes with pre-built probe definitions called modules. Think of them as profiles for different kinds of checks:

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_status_codes: [200, 201]

  http_post_2xx:
    prober: http
    timeout: 5s
    http:
      method: POST
      body: '{"key": "value"}'
      valid_status_codes: [200]

  tcp_connect:
    prober: tcp
    timeout: 5s

  dns:
    prober: dns
    timeout: 5s
    dns:
      preferred_ip_protocol: "ip4"
      query_name: "example.com"

  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"

The common pattern: define what you’re checking (HTTP response codes, TCP port, DNS record, ICMP), set a timeout (don’t wait forever for dead services), and let Prometheus scrape the results.

Installing and Running Blackbox

Download the binary from the Prometheus releases page, or use your distro’s package manager:

# Debian/Ubuntu
sudo apt install prometheus-blackbox-exporter

# Or download manually
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.28.0/blackbox_exporter-0.28.0.linux-amd64.tar.gz
tar xzf blackbox_exporter-0.28.0.linux-amd64.tar.gz
sudo mv blackbox_exporter-0.28.0.linux-amd64/blackbox_exporter /usr/local/bin/

Create a config file (blackbox.yml):

modules:
  http_2xx:
    prober: http
    timeout: 10s
    http:
      valid_status_codes: [200, 201, 202, 204]
      follow_redirects: true
      preferred_ip_protocol: "ip4"

  http_post_2xx:
    prober: http
    timeout: 10s
    http:
      method: POST
      valid_status_codes: [200, 201]
      preferred_ip_protocol: "ip4"

  tcp_connect:
    prober: tcp
    timeout: 5s
    tcp:
      preferred_ip_protocol: "ip4"

  dns_lookup:
    prober: dns
    timeout: 5s
    dns:
      preferred_ip_protocol: "ip4"
      query_name: "example.com"

  icmp_ping:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"

Run it:

blackbox_exporter --config.file=blackbox.yml

By default, it listens on http://localhost:9115/metrics. Prometheus scrapes that. Done.

Wiring It Into Prometheus

Here’s the trick: Blackbox is one exporter, but you’re probing many targets. You need to pass the target and module to it via URL parameters. In Prometheus config:

scrape_configs:
  - job_name: "blackbox-http"
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://sumguy.com
        - https://example.com
        - https://api.myapp.local
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: "localhost:9115"

  - job_name: "blackbox-tcp"
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - "api.myapp.local:443"
        - "db.myapp.local:5432"
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: "localhost:9115"

  - job_name: "blackbox-dns"
    metrics_path: /probe
    params:
      module: [dns_lookup]
    static_configs:
      - targets:
        - "example.com"
        - "myapp.local"
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: "localhost:9115"

The relabel_configs magic here is critical: it takes your target list, passes each one as __param_target to Blackbox, then sets the instance label to the actual target (not localhost:9115). This way your alerts show “https://sumguy.com is down” instead of “blackbox exporter host is down.”

Real-World Example: Certificate Expiry Alerts

One of the most useful Blackbox probes: catching TLS certificate expiry before your customers notice. Every HTTP probe includes probe_tls_cert_not_after, a Unix timestamp of when the cert expires.

Add this alert rule:

groups:
  - name: blackbox_alerts
    rules:
      - alert: SSLCertificateExpiring
        expr: |
          (probe_tls_cert_not_after - time()) / 86400 < 14
        for: 1h
        annotations:
          summary: "SSL cert for {{ $labels.instance }} expires in {{ humanize (($value | int) + 1) }} days"
          description: "Certificate expires in {{ humanize (($value | int) + 1) }} days. Renew now."

      - alert: HTTPProbeDown
        expr: probe_success == 0
        for: 2m
        annotations:
          summary: "{{ $labels.instance }} is unreachable"
          description: "Probe to {{ $labels.instance }} has failed for 2 minutes. Check logs."

      - alert: HTTPProbeHighLatency
        expr: probe_duration_seconds > 2
        for: 5m
        annotations:
          summary: "{{ $labels.instance }} is slow"
          description: "{{ $labels.instance }} responding in {{ $value }}s — check performance."

Bam. Now you get notified 14 days before your cert dies, instead of having your API go dark at 3 AM on a Friday.

DNS Probes for Split-Horizon Shenanigans

Maybe you’ve got internal DNS returning a private IP and external DNS returning a public IP (classic setup for self-hosting). Blackbox can check both:

scrape_configs:
  - job_name: "blackbox-dns-internal"
    metrics_path: /probe
    params:
      module: [dns_lookup_internal]
    static_configs:
      - targets:
        - "myapp.internal"
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: "monitoring-box.internal:9115"

  - job_name: "blackbox-dns-external"
    metrics_path: /probe
    params:
      module: [dns_lookup_external]
    static_configs:
      - targets:
        - "myapp.com"
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: "vps-in-aws:9115"

Run one Blackbox instance inside your network, one on a VPS, both scraping the same domains. You’ll catch DNS misconfigs immediately.

ICMP Pings and the CAP_NET_RAW Headache

Ping (ICMP) is useful for “is this host even alive?” checks. But it requires elevated permissions:

# Option 1: Run blackbox as root (bad)
sudo blackbox_exporter --config.file=blackbox.yml

# Option 2: Grant CAP_NET_RAW (better)
sudo setcap cap_net_raw=ep /usr/local/bin/blackbox_exporter
blackbox_exporter --config.file=blackbox.yml

# Option 3: Run in a container with --cap-add=NET_RAW
docker run --cap-add=NET_RAW -p 9115:9115 \
  -v /path/to/blackbox.yml:/etc/blackbox_exporter/blackbox.yml \
  prom/blackbox-exporter:latest

Honestly, Option 2 (setcap) is the cleanest for a bare-metal setup. If you’re running Docker, add --cap-add=NET_RAW to your compose file.

Synthetic Monitoring Across Multiple Sites

Here’s where Blackbox gets spicy: run instances in different places. Monitoring box in your home lab? Run another one on a DigitalOcean droplet or Hetzner box. Both scrape your production API. Now you catch:

Local network failures (home lab → API)
ISP issues (does your internet actually work?)
Geolocation-based failures (maybe your CDN is broken in one region)
DNS propagation issues (nameservers disagree)

All from one Prometheus instance. Each Blackbox exporter tags metrics with its own hostname or datacenter label, so you’ll see exactly which vantage point failed.

scrape_configs:
  - job_name: "blackbox-global"
    honor_timestamps: false
    static_configs:
      - targets: [https://myapp.com, https://api.myapp.com]
        labels:
          site: "homelab"
      - targets: [https://myapp.com, https://api.myapp.com]
        labels:
          site: "aws"

When site: "aws" fails but site: "homelab" passes, you know it’s AWS’s problem, not yours.

Blackbox vs. Uptime Kuma: When Do You Use Which?

Uptime Kuma is a dashboard. Pretty UI, simple setup, good for “show the boss our uptime.” But:

No native Prometheus integration (unless you scrape its metrics endpoint)
Alerts go to Discord/Slack directly (no Alertmanager routing)
No easy way to do multi-site probing
Another service to maintain

Blackbox Exporter is plumbing. Config-as-code, tight Prometheus integration, routes through your alerting rules. But:

No built-in dashboard (use Grafana instead)
Config is YAML, not UI clicks
Requires Prometheus already running

Real talk: If you’ve got Prometheus and Alertmanager, use Blackbox. If you want a standalone, low-touch “show me the uptime” dashboard and don’t have Prometheus yet, Uptime Kuma is faster to spin up. But they’re not mutually exclusive, some teams run both (Blackbox for deep integration, Kuma for executive status page).

Gotchas and Tuning

Timeout mismatch: If your Blackbox timeout is 5s but Prometheus scrape timeout is 10s, you’ll get timeout errors on the exporter’s side before Prometheus even blinks. Set Blackbox timeouts lower than scrape timeout:

# In prometheus.yml
scrape_interval: 30s
scrape_timeout: 15s

# In blackbox.yml
timeout: 10s

IPv6 by default: On some systems, Blackbox prefers IPv6. If your targets don’t support IPv6, add preferred_ip_protocol: "ip4" to every module:

http_2xx:
  prober: http
  timeout: 10s
  http:
    preferred_ip_protocol: "ip4"

Redirect loops: If your HTTP module follows redirects by default and you’ve got a broken redirect chain, the probe will timeout. Explicitly set follow_redirects: false if you want to check the first response only.

TLS verification: By default, Blackbox verifies TLS certificates. If you’re probing internal services with self-signed certs, add:

http_2xx:
  prober: http
  http:
    tls_config:
      insecure_skip_verify: true

(Yeah, it’s insecure. But it’s better than not knowing your internal API is down.)

Useful PromQL Queries

# Recent probe success rate (last 5 minutes)
rate(probe_success[5m])

# 95th-percentile probe duration across all targets
# (probe_duration_seconds is a gauge, not a histogram — aggregate with quantile())
quantile(0.95, probe_duration_seconds)

# Which targets are currently down?
probe_success == 0

# Certificate expiry in days
(probe_tls_cert_not_after - time()) / 86400

# Target downtime in the last hour
count_values("value", increase(probe_success[1h])) == 0

The Probes Worth Setting Up Today

If you’re running Prometheus in a home lab or small production, start with:

HTTP/HTTPS checks on your public-facing services: catch outages early
TLS cert expiry alerts: 14-day warnings save Fridays
DNS checks across internal and external resolvers: catch split-brain disasters
TCP port checks on critical services: database, cache, queue

Skip ICMP ping unless you specifically need to know “is this host alive?” If your services respond to HTTP, HTTP probes are better.

And yes, you can run all of this on the same Pi that’s running Prometheus. One Blackbox instance, a 10-line config, and you’re done. That’s how you replace Pingdom without spending a dime.

Your 2 AM self will appreciate not having to pay for monitoring. SaaS uptime checks smell like rent extraction anyway.

Blackbox Exporter: HTTP/TCP/DNS/ICMP Probes

Pingdom Sells Pings. You Already Have Prometheus.

What Blackbox Actually Does

The Modules: What You’re Actually Probing

Installing and Running Blackbox

Wiring It Into Prometheus

Real-World Example: Certificate Expiry Alerts

DNS Probes for Split-Horizon Shenanigans

ICMP Pings and the CAP_NET_RAW Headache

Synthetic Monitoring Across Multiple Sites

Blackbox vs. Uptime Kuma: When Do You Use Which?

Gotchas and Tuning

Useful PromQL Queries

The Probes Worth Setting Up Today

Responses from around the web

Discussion

Related Posts

cAdvisor + Prometheus: Per-Container Metrics Done Right

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks

Blackbox Exporter: HTTP/TCP/DNS/ICMP Probes

Pingdom Sells Pings. You Already Have Prometheus.

What Blackbox Actually Does

The Modules: What You’re Actually Probing

Installing and Running Blackbox

Wiring It Into Prometheus

Real-World Example: Certificate Expiry Alerts

DNS Probes for Split-Horizon Shenanigans

ICMP Pings and the CAP_NET_RAW Headache

Synthetic Monitoring Across Multiple Sites

Blackbox vs. Uptime Kuma: When Do You Use Which?

Gotchas and Tuning

Useful PromQL Queries

The Probes Worth Setting Up Today

Related Reading

Responses from around the web

Discussion

Related Posts

cAdvisor + Prometheus: Per-Container Metrics Done Right

TIG: Telegraf + InfluxDB + Grafana

Promtail to Alloy Migration: A Practical Diff

LibreNMS for SNMP-Heavy Home Networks