Skip to content
Go back

Push vs Pull Metrics: Pushgateway, Pushprox, and Why

By SumGuy 10 min read
Push vs Pull Metrics: Pushgateway, Pushprox, and Why

Prometheus Pulls. Sometimes Things Want to Push.

Here’s the thing about Prometheus: it’s built on a pull model. Your Prometheus server scrapes HTTP endpoints on a regular schedule, collects metrics, and stores them. It’s a fishing rod, not a postman. This design is mostly brilliant—you get built-in liveness detection (if a target goes down, Prometheus notices immediately), you control the scrape cadence, and dead hosts are automatically deleted from your database.

But then your 3 AM cron job finishes backing up 2 TB of NAS data, wants to record “backup_duration_seconds=847”, and disappears into the void before Prometheus even notices it exists.

Or you’ve got a container running behind a NAT firewall on a remote site, and your Prometheus server can’t reach it. Your laptop running a heavy data-processing task generates a few metrics and then shuts down. A batch job on a CI/CD system generates metrics for 30 seconds and exits.

These are the moments when the pull model breaks. And that’s when people reach for the push side of monitoring.


Why Pull, Anyway?

Before we talk about pushing metrics, let’s understand why Prometheus chose the pull model in the first place—because it matters.

Service discovery without a registry. Pull means Prometheus scans your infrastructure on a schedule. If you spin up a new service, Prometheus sees it (via DNS, Kubernetes API, Consul, whatever). You don’t need the service to know that Prometheus exists. The server figures out where things are.

Dead-host detection. Prometheus marks a scrape target as “down” when it can’t reach it. No agent is sitting there wondering if it’s still alive. The absence of a heartbeat is data. This is surprisingly valuable in a chaotic infrastructure.

Centralized scrape control. You decide how often to scrape, what timeout to use, and when to back off. Your Prometheus server is the single source of scrape truth. No rogue agent hammering your Prometheus with metrics every 100 milliseconds.

No agents to manage. You don’t need a Prometheus “push daemon” running on every host. You just expose a metrics endpoint, and Prometheus finds it.

All of this is elegant. And it works great until it doesn’t.


Where Pull Breaks Down

Batch jobs and ephemeral work. A cron job runs every night, takes 10 minutes, and exits. Prometheus’s scrape interval is probably 15–30 seconds. The chances of Prometheus hitting that window are terrible. More often, the job finishes and the metrics die with it. If Prometheus did manage to scrape during the job’s lifetime, you’d only get a single data point. That’s not telemetry—that’s a coin flip.

NAT and firewalls. You’ve got monitoring agents on a remote site, behind a NAT’d router. Your Prometheus server can’t initiate a connection back to them. You could open port forwards, but now you’re managing firewall rules and hoping they don’t break. Or you could ask the remote site to push metrics—but then you’re running a metrics-ingestion service, and that’s a different beast.

Short-lived containers. Your Kubernetes cluster spawns a Job pod that runs for 3 seconds, logs a metric, and dies. Unless Prometheus gets lucky with its scrape timing (and you probably have a 30-second window), those metrics are gone. Kubernetes Jobs are inherently pull-hostile.

Privacy and trust boundaries. Sometimes you don’t want to give Prometheus direct network access to a system. You’d rather have the system push metrics to a single, controlled endpoint.


Enter Pushgateway

The first reaction to “I need to push metrics” is almost always: Pushgateway.

Pushgateway is a simple service that acts as a metrics broker. Instead of pushing directly to Prometheus, your job pushes to Pushgateway, which holds the metrics in memory. Prometheus then scrapes Pushgateway like it would any other target. Push gets converted back to pull.

This works. And for some use cases, it’s exactly right.

Good use case: batch job exit status.

Your backup script finishes. It pushes a few metrics to Pushgateway—exit code, duration, bytes transferred, timestamp of next run. Pushgateway holds them. Five minutes later, Prometheus scrapes Pushgateway and sees those metrics. Hours later, you query them and see the backup ran successfully.

Here’s what that looks like:

Terminal window
# Backup script finishes, calculates exit metrics
BACKUP_DURATION=847
BACKUP_SIZE_BYTES=2147483648
EXIT_CODE=0
# Push to Pushgateway
cat <<EOF | curl -d @- http://pushgateway.local:9091/metrics/job/backup_script/instance/nas-01
# HELP backup_duration_seconds Duration of backup job
# TYPE backup_duration_seconds gauge
backup_duration_seconds{instance="nas-01"} $BACKUP_DURATION
# HELP backup_size_bytes Total bytes backed up
# TYPE backup_size_bytes gauge
backup_size_bytes{instance="nas-01"} $BACKUP_SIZE_BYTES
# HELP backup_exit_code Exit code of backup job
# TYPE backup_exit_code gauge
backup_exit_code{instance="nas-01"} $EXIT_CODE
EOF

And Prometheus scrapes Pushgateway on its normal schedule, ingests those metrics, and everything works.

The catch: Pushgateway is a metrics sink.

This is where people get sloppy. Pushgateway is not designed for long-running services to push metrics continuously. If your web service is pushing metrics every second to Pushgateway instead of exposing a /metrics endpoint, you’ve made a mistake.

Here’s why: Pushgateway will never know when your service dies. If your service pushes metrics to Pushgateway every second and then crashes, Pushgateway still has the last batch of metrics in memory. Prometheus scrapes Pushgateway an hour later and sees the dead service’s metrics. You’ve lost dead-host detection.

Worse, if you’re rotating job IDs (like pushing with job=backup_script one day and job=backup_script_v2 the next), old metrics accumulate in Pushgateway unless you manually delete them. Your Prometheus database bloats with stale metrics for jobs that no longer exist.

The Prometheus way: Use Pushgateway only for batch jobs and short-lived tasks. Expose a /metrics endpoint for anything long-running.


PushProx: Pull Through a Firewall

NAT breaks pull. You’ve got agents on remote sites, and your Prometheus can’t reach them. Pushgateway works, but you’re back to the metrics-sink problem if you have multiple remote sites with long-running services.

Enter PushProx.

PushProx is clever. It flips the direction of pull, but keeps pull’s semantics intact.

Here’s the architecture:

  1. You run a PushProx proxy on a publicly accessible server (your monitoring infrastructure).
  2. On each remote site, you run a PushProx client. The client connects outbound to the proxy—a reverse tunnel.
  3. When Prometheus wants to scrape a remote target, it doesn’t talk to the target directly. It talks to the proxy.
  4. The proxy says, “Hey client, scrape your local service and send me the metrics.”
  5. The client scrapes its local /metrics endpoint and sends the results back through the tunnel.

The end result: Prometheus still pulls, but it pulls through a firewall-friendly reverse tunnel.

It’s like a game of telephone through a NAT’d network. The semantics stay the same (pull, scrape intervals, dead-host detection). The transport changes.

Here’s a simple PushProx setup:

Terminal window
# On your monitoring server, run the proxy (listens for client connections)
./pushprox-proxy --listen-address=0.0.0.0:8080
Terminal window
# On the remote site, run the PushProx client with --proxy-url pointing at the proxy
./pushprox-client --proxy-url=http://proxy.monitoring-site.com:8080 \
--fqdn=remote-site-service.internal

When Prometheus scrapes PushProx, it looks like:

Terminal window
# Prometheus config
scrape_configs:
- job_name: 'remote-site-service'
metrics_path: '/probe' # Special PushProx endpoint
params:
module: ['http_2xx']
target: ['http://localhost:8080'] # The service *on the remote site*
static_configs:
- targets: ['proxy.monitoring-site.com:8080'] # Talk to the proxy

PushProx is underrated. It solves the “I can’t pull because of NAT” problem without the footguns of Pushgateway. But it’s also less obvious than Pushgateway, so people don’t reach for it first.


OpenTelemetry and Push Collectors

If you’re building fresh infrastructure, another option is OpenTelemetry with a push-native collector.

OTel is a newer observability standard. It’s language-agnostic, supports metrics, traces, and logs, and has both pull and push receivers. An OTel collector can ingest metrics via OTLP (OpenTelemetry Protocol—a gRPC-based push format) and export them to Prometheus, InfluxDB, Datadog, or anywhere else.

For short-lived jobs and batch work, OTel’s push model is more natural:

Terminal window
# Batch job exports metrics via OTLP
# (This is pseudocode—language varies)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.local:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
# Job runs, emits metrics to the collector
run_backup_job --export-metrics

The collector buffers the metrics and exports them to your time-series database on a schedule. No Pushgateway, no reverse tunnels—just a standard push flow.

OTel is more heavyweight than Prometheus + Pushgateway/PushProx, but if you’re already using traces and logs alongside metrics, it’s worth considering.


StatsD: The Old Guard

Before Prometheus, there was StatsD. It’s a dead-simple push-based metrics daemon that listens on UDP, accepts metrics from anywhere, aggregates them, and forwards them to Graphite (or Carbon, or modern backends).

StatsD is fire-and-forget. You generate a metric, send it UDP to localhost:8125, and move on. No retry logic, no guarantees. It’s lightweight and stupid—exactly what you want for high-volume metric emission from thousands of processes.

Prometheus has its own StatsD exporter. If you’re running legacy infrastructure that already uses StatsD, you can bridge it to Prometheus:

# Prometheus config
scrape_configs:
- job_name: 'statsd'
static_configs:
- targets: ['localhost:9102'] # StatsD exporter listens here

The exporter listens to StatsD packets, aggregates them, and exposes them as Prometheus metrics.

StatsD is simpler than OTel and lower-overhead than Pushgateway. But it’s also less instrumented—no built-in service discovery, no lifecycle semantics, just metrics.


When Push Is Worth the Footnote

So when should you actually push metrics?

Batch jobs and cron work. This is the canonical use case. Your job runs, pushes metrics to Pushgateway, and exits. Simple, direct, works.

Short-lived tasks in Kubernetes. Use Pushgateway or OTel. Job pods can’t be pulled; push is the right fit.

NAT’d remote sites with mixed workloads. If you have both long-running services and batch work on a remote site:

High-frequency metric emission. If you’re emitting thousands of metrics per second and UDP fire-and-forget is acceptable, StatsD is lighter weight than shipping metrics to Prometheus one at a time.

Privacy-sensitive environments. If you don’t want to give Prometheus direct network access, push to a metrics collector behind a firewall. The collector exports to your central Prometheus on a schedule.

What you should NOT do:


The Real Question

Here’s what nobody tells you: the decision between push and pull isn’t really about push vs pull. It’s about whether your workload is long-lived or ephemeral.

Long-lived service? Expose metrics. Let Prometheus pull. You get dead-host detection, a single source of scrape truth, and clean semantics.

Ephemeral job? Nowhere to pull from. Push to a broker (Pushgateway, OTel, StatsD). Accept that you won’t get dead-host detection, but you will get your metrics recorded.

NAT or firewall between you and Prometheus? Use PushProx and pretend you’re still pulling. It’s elegant and weird in equal measure.

The tool you pick (Pushgateway, PushProx, OTel, StatsD) is secondary. Pick the one that fits your infrastructure and your patience for configuration. But the decision—push or pull—is really about whether the thing emitting metrics exists long enough for Prometheus to find it.

If it does, pull. If it doesn’t, push. Everything else is footnotes.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
iperf3 + nload: Network Diagnosis

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts