Skip to content
Go back

Blackbox Exporter: HTTP/TCP/DNS/ICMP Probes

By SumGuy 9 min read
Blackbox Exporter: HTTP/TCP/DNS/ICMP Probes

Pingdom Sells Pings. You Already Have Prometheus.

You’re sitting in Prometheus. You’ve got nodes exporting their guts — CPU, memory, disk, all the internal metrics you can eat. But here’s the thing: what happens when your API is up and humming on the inside, but nobody on the internet can reach it? Internal metrics won’t catch that. You need someone standing outside your firewall, actually pinging your services, checking if they’re dead to the world.

That’s what Blackbox Exporter does. It’s Prometheus’s synthetic monitoring leg — a standalone tool that probes your services from the outside (or from different vantage points) and reports back success or failure. No SaaS fees. No Pingdom. No Datadog uptime monitoring tab. Just a Go binary and a config file.

If you’re paying $30/month for Uptime Kuma to alert you when your Nextcloud falls over, or you’ve got Pingdom bleeding $200/year just to know when DNS fails, stop. Blackbox Exporter does this for free. And if you’re already running Prometheus + Alertmanager, you’ve already solved the hard part.


What Blackbox Actually Does

Blackbox Exporter is not an agent. It doesn’t run on your target. It runs somewhere with a clear view outbound — your monitoring station, a VPS, a Pi in another timezone, whatever. From that vantage point, it:

Every probe result becomes a Prometheus metric: probe_success (0 or 1), probe_duration_seconds, probe_http_status_code, probe_tls_cert_not_after, etc. Feed that into Alertmanager, get a Slack message when things break. Done.


The Modules: What You’re Actually Probing

Blackbox comes with pre-built probe definitions called modules. Think of them as profiles for different kinds of checks:

modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_status_codes: [200, 201]
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
body: '{"key": "value"}'
valid_status_codes: [200]
tcp_connect:
prober: tcp
timeout: 5s
dns:
prober: dns
timeout: 5s
dns:
preferred_ip_protocol: "ip4"
query_name: "example.com"
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"

The common pattern: define what you’re checking (HTTP response codes, TCP port, DNS record, ICMP), set a timeout (don’t wait forever for dead services), and let Prometheus scrape the results.


Installing and Running Blackbox

Download the binary from the Prometheus releases page, or use your distro’s package manager:

Terminal window
# Debian/Ubuntu
sudo apt install prometheus-blackbox-exporter
# Or download manually
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
tar xzf blackbox_exporter-0.25.0.linux-amd64.tar.gz
sudo mv blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter /usr/local/bin/

Create a config file (blackbox.yml):

modules:
http_2xx:
prober: http
timeout: 10s
http:
valid_status_codes: [200, 201, 202, 204]
follow_redirects: true
preferred_ip_protocol: "ip4"
http_post_2xx:
prober: http
timeout: 10s
http:
method: POST
valid_status_codes: [200, 201]
preferred_ip_protocol: "ip4"
tcp_connect:
prober: tcp
timeout: 5s
tcp:
preferred_ip_protocol: "ip4"
dns_lookup:
prober: dns
timeout: 5s
dns:
preferred_ip_protocol: "ip4"
query_name: "example.com"
icmp_ping:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"

Run it:

Terminal window
blackbox_exporter --config.file=blackbox.yml

By default, it listens on http://localhost:9115/metrics. Prometheus scrapes that. Done.


Wiring It Into Prometheus

Here’s the trick: Blackbox is one exporter, but you’re probing many targets. You need to pass the target and module to it via URL parameters. In Prometheus config:

scrape_configs:
- job_name: "blackbox-http"
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://sumguy.com
- https://example.com
- https://api.myapp.local
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: "localhost:9115"
- job_name: "blackbox-tcp"
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- "api.myapp.local:443"
- "db.myapp.local:5432"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: "localhost:9115"
- job_name: "blackbox-dns"
metrics_path: /probe
params:
module: [dns_lookup]
static_configs:
- targets:
- "example.com"
- "myapp.local"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: "localhost:9115"

The relabel_configs magic here is critical: it takes your target list, passes each one as __param_target to Blackbox, then sets the instance label to the actual target (not localhost:9115). This way your alerts show “https://sumguy.com is down” instead of “blackbox exporter host is down.”


Real-World Example: Certificate Expiry Alerts

One of the most useful Blackbox probes: catching TLS certificate expiry before your customers notice. Every HTTP probe includes probe_tls_cert_not_after — a Unix timestamp of when the cert expires.

Add this alert rule:

groups:
- name: blackbox_alerts
rules:
- alert: SSLCertificateExpiring
expr: |
(probe_tls_cert_not_after - time()) / 86400 < 14
for: 1h
annotations:
summary: "SSL cert for {{ $labels.instance }} expires in {{ humanize (($value | int) + 1) }} days"
description: "Certificate expires in {{ humanize (($value | int) + 1) }} days. Renew now."
- alert: HTTPProbeDown
expr: probe_success == 0
for: 2m
annotations:
summary: "{{ $labels.instance }} is unreachable"
description: "Probe to {{ $labels.instance }} has failed for 2 minutes. Check logs."
- alert: HTTPProbeHighLatency
expr: probe_duration_seconds > 2
for: 5m
annotations:
summary: "{{ $labels.instance }} is slow"
description: "{{ $labels.instance }} responding in {{ $value }}s — check performance."

Bam. Now you get notified 14 days before your cert dies, instead of having your API go dark at 3 AM on a Friday.


DNS Probes for Split-Horizon Shenanigans

Maybe you’ve got internal DNS returning a private IP and external DNS returning a public IP (classic setup for self-hosting). Blackbox can check both:

scrape_configs:
- job_name: "blackbox-dns-internal"
metrics_path: /probe
params:
module: [dns_lookup_internal]
static_configs:
- targets:
- "myapp.internal"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: "monitoring-box.internal:9115"
- job_name: "blackbox-dns-external"
metrics_path: /probe
params:
module: [dns_lookup_external]
static_configs:
- targets:
- "myapp.com"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: "vps-in-aws:9115"

Run one Blackbox instance inside your network, one on a VPS, both scraping the same domains. You’ll catch DNS misconfigs immediately.


ICMP Pings and the CAP_NET_RAW Headache

Ping (ICMP) is useful for “is this host even alive?” checks. But it requires elevated permissions:

Terminal window
# Option 1: Run blackbox as root (bad)
sudo blackbox_exporter --config.file=blackbox.yml
# Option 2: Grant CAP_NET_RAW (better)
sudo setcap cap_net_raw=ep /usr/local/bin/blackbox_exporter
blackbox_exporter --config.file=blackbox.yml
# Option 3: Run in a container with --cap-add=NET_RAW
docker run --cap-add=NET_RAW -p 9115:9115 \
-v /path/to/blackbox.yml:/etc/blackbox_exporter/blackbox.yml \
prom/blackbox-exporter:latest

Honestly, Option 2 (setcap) is the cleanest for a bare-metal setup. If you’re running Docker, add --cap-add=NET_RAW to your compose file.


Synthetic Monitoring Across Multiple Sites

Here’s where Blackbox gets spicy: run instances in different places. Monitoring box in your home lab? Run another one on a DigitalOcean droplet or Hetzner box. Both scrape your production API. Now you catch:

All from one Prometheus instance. Each Blackbox exporter tags metrics with its own hostname or datacenter label — you’ll see exactly which vantage point failed.

scrape_configs:
- job_name: "blackbox-global"
honor_timestamps: false
static_configs:
- targets: [https://myapp.com, https://api.myapp.com]
labels:
site: "homelab"
- targets: [https://myapp.com, https://api.myapp.com]
labels:
site: "aws"

When site: "aws" fails but site: "homelab" passes, you know it’s AWS’s problem, not yours.


Blackbox vs. Uptime Kuma: When Do You Use Which?

Uptime Kuma is a dashboard. Pretty UI, simple setup, good for “show the boss our uptime.” But:

Blackbox Exporter is plumbing. Config-as-code, tight Prometheus integration, routes through your alerting rules. But:

Real talk: If you’ve got Prometheus and Alertmanager, use Blackbox. If you want a standalone, low-touch “show me the uptime” dashboard and don’t have Prometheus yet, Uptime Kuma is faster to spin up. But they’re not mutually exclusive — some teams run both (Blackbox for deep integration, Kuma for executive status page).


Gotchas and Tuning

Timeout mismatch: If your Blackbox timeout is 5s but Prometheus scrape timeout is 10s, you’ll get timeout errors on the exporter’s side before Prometheus even blinks. Set Blackbox timeouts lower than scrape timeout:

# In prometheus.yml
scrape_interval: 30s
scrape_timeout: 15s
# In blackbox.yml
timeout: 10s

IPv6 by default: On some systems, Blackbox prefers IPv6. If your targets don’t support IPv6, add preferred_ip_protocol: "ip4" to every module:

http_2xx:
prober: http
timeout: 10s
http:
preferred_ip_protocol: "ip4"

Redirect loops: If your HTTP module follows redirects by default and you’ve got a broken redirect chain, the probe will timeout. Explicitly set follow_redirects: false if you want to check the first response only.

TLS verification: By default, Blackbox verifies TLS certificates. If you’re probing internal services with self-signed certs, add:

http_2xx:
prober: http
http:
tls_config:
insecure_skip_verify: true

(Yeah, it’s insecure. But it’s better than not knowing your internal API is down.)


Useful PromQL Queries

# Recent probe success rate (last 5 minutes)
rate(probe_success[5m])
# Probe duration distribution
histogram_quantile(0.95, probe_duration_seconds)
# Which targets are currently down?
probe_success == 0
# Certificate expiry in days
(probe_tls_cert_not_after - time()) / 86400
# Target downtime in the last hour
count_values("value", increase(probe_success[1h])) == 0

The Probes Worth Setting Up Today

If you’re running Prometheus in a home lab or small production, start with:

  1. HTTP/HTTPS checks on your public-facing services — catch outages early
  2. TLS cert expiry alerts — 14-day warnings save Fridays
  3. DNS checks across internal and external resolvers — catch split-brain disasters
  4. TCP port checks on critical services — database, cache, queue

Skip ICMP ping unless you specifically need to know “is this host alive?” If your services respond to HTTP, HTTP probes are better.

And yes, you can run all of this on the same Pi that’s running Prometheus. One Blackbox instance, a 10-line config, and you’re done. That’s how you replace Pingdom without spending a dime.

Your 2 AM self will appreciate not having to pay for monitoring. SaaS uptime checks smell like rent extraction anyway.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
iperf3 + nload: Network Diagnosis

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts