Prometheus Is Great, Until You Have 200 ESP32s
You set up Prometheus. You scraped a few services. You felt very DevOps about it. Then you added your smart home stuff — a dozen ESP32 temperature sensors, a handful of Zigbee devices reporting via Zigbee2MQTT, a Pi monitoring your UPS. And suddenly Prometheus starts looking at you the way your dog looks at a vacuum cleaner: confused, vaguely hostile.
Here’s the thing about Prometheus: it’s a pull-based system. It goes out and scrapes your targets on a schedule. That model is phenomenal for containerized microservices that register themselves, maintain a /metrics endpoint, and live on a stable IP. It’s less great when your “targets” are:
- A sensor glued to your furnace with a lipo battery
- A fleet of ESP32s on random DHCP leases behind NAT
- A Zigbee coordinator that speaks MQTT, not HTTP
- Something that only has data when it has data (IoT devices, event-based telemetry)
That’s where the TIG stack comes in. Telegraf (the collector), InfluxDB (the time-series database), and Grafana (the dashboard). Push-based, sensor-friendly, and honest about what it is. No service discovery drama. No scrape interval mismatches. Just data flowing in, stored efficiently, and displayed on a beautiful dashboard you built at 2 AM while your spouse questioned your life choices.
Pull vs Push: Why Push Wins Here
Pull-based collection (Prometheus’s model) requires your targets to be reachable, stable, and willing to serve HTTP. For server infrastructure this is totally fine — your containers stay up, your IPs are predictable, and Prometheus is excellent at this job.
But IoT is a different animal. Consider:
- NAT traversal: Your ESP32 in the garage can reach the outside world, but the outside world can’t reach it. A pull scraper can’t phone home to a device behind NAT without a lot of networking gymnastics.
- Ephemeral devices: Sensors go offline. Batteries die. The device on
192.168.1.73today might be.74tomorrow. A push model doesn’t care — the device sends data when it has data, and that’s that. - No persistent HTTP server: Running a metrics endpoint on a microcontroller uses RAM and CPU you’d rather spend on actual sensing. Pushing over MQTT or HTTP is way cheaper.
- Event-driven telemetry: Some devices only have interesting data occasionally. A moisture sensor that fires when the soil is dry doesn’t need to be scraped every 15 seconds producing 14,999 “nothing to report” data points.
Push wins here, full stop. The TIG stack is built around this model. Telegraf sits on your server, subscribes to MQTT topics, listens for incoming data, and shovels everything into InfluxDB. Your devices don’t need to know or care what’s storing their data.
Telegraf: The Agent That Does Everything
Telegraf is InfluxData’s collection agent and it is, frankly, absurdly capable. Over 300 input plugins, dozens of output plugins, and a config format that won’t make you want to quit infrastructure forever.
The core idea is simple: input plugins collect data, processor plugins transform it (optional), and output plugins write it somewhere. For the TIG stack, the output is almost always InfluxDB. The inputs are where it gets fun.
Some inputs you’ll actually use in a homelab:
inputs.cpu,inputs.mem,inputs.disk— classic system metricsinputs.mqtt_consumer— subscribe to MQTT topics (huge for ESPHome, Zigbee2MQTT, Tasmota)inputs.snmp— query your router/switch/NASinputs.exec— run a script, parse the output as metrics (cursed but useful)inputs.modbus— industrial sensors, solar inverters, the fun stuffinputs.ping— latency monitoring for your entire networkinputs.http— scrape a JSON endpoint, because sometimes things speak REST
One gotcha: the default telegraf.conf has nearly every plugin commented in with examples. It’s a great reference, but it’s also a loaded footgun. If you enable inputs carelessly, you’ll flood InfluxDB with metrics you’ll never look at, bloat your disk, and then spend an evening figuring out why your cardinality is out of control. Start minimal. Add what you actually need.
InfluxDB: Time-Series Done Right (Mostly)
InfluxDB is purpose-built for time-series data. Timestamps are first-class citizens. Queries assume you’re asking about things over time. Storage is optimized for sequential writes from many sources. For sensor data and metrics, it’s genuinely the right tool.
A quick vocabulary lesson:
- Measurement: Like a table. “cpu”, “temperature”, “power_draw”
- Tags: Indexed key-value strings. “host”, “room”, “sensor_id”. Use these for things you’ll filter or group by.
- Fields: The actual numeric (or string) values. “usage_percent”, “celsius”, “watts”
- Line protocol: The wire format.
temperature,room=garage,sensor=esp32_01 celsius=21.5 1746700000000000000
The v1/v2/v3 situation in 2026:
This is where you need to pay attention, because InfluxDB’s version history is a bit of a saga.
- InfluxDB 1.x: SQL-like query language (InfluxQL), simple, battle-tested, still widely used. The
influxdb:1.8Docker image still works great and is many people’s choice for homelabs. - InfluxDB 2.x: Rewrote everything. Introduced Flux, a new functional query language that is powerful but genuinely alien-feeling. Buckets replaced databases. Organizations replaced… everything. The UI is nice. The migration path from 1.x was rough.
- InfluxDB 3.x (OSS, still maturing): Here’s the twist — InfluxData deprecated Flux entirely and is moving back toward SQL with Apache Arrow Flight as the engine. The 3.x open-source release is in active development. SQL queries are coming back. Flux is being sunsetted.
For a homelab in 2026, the pragmatic choice is InfluxDB 2.x (2.7.x is stable). You get a solid UI, good Grafana integration, and Flux still works even if it’s being deprecated. If you’re starting fresh and want to be forward-compatible, keep an eye on 3.x OSS — but it’s not quite “plug it in on a Sunday afternoon” stable yet.
Retention policies and downsampling: InfluxDB handles data retention natively. You can keep raw data for 30 days and downsampled (hourly averages) data for a year. On InfluxDB 2.x, this is done via Tasks — scheduled Flux queries that aggregate and write to a different bucket. Indispensable if you’re pushing data every 10 seconds and don’t want to buy more NVMe.
Cardinality gotcha: This one bites people hard. In InfluxDB, cardinality = the number of unique tag value combinations. If you use sensor_id as a tag and you have 500 sensors, that’s fine. If you use something like a full UUID or a raw timestamp as a tag, you’ve just created millions of unique series and InfluxDB will eat your RAM like it’s a buffet. Tags should be low-cardinality. Fields can hold high-cardinality data. Tattoo that on your forearm before you start tagging things.
Grafana: The Dashboard Layer You Already Know
If you’ve been in the homelab space for more than six months, you’ve probably already used Grafana. It connects to almost everything and makes charts that look like you know what you’re doing.
For InfluxDB, you’ll add it as a data source in Grafana’s settings. With InfluxDB 2.x, you configure it with the bucket, org, and an API token. Pick the query language (Flux or InfluxQL, both supported via the datasource settings). Flux gives you more power; InfluxQL feels more familiar if you’ve used SQL.
Alerting: Grafana’s built-in alerting is solid for homelab use. Set a threshold on a temperature sensor, get notified via Telegram or email when the server closet hits 40°C. No PagerDuty subscription required.
The Working Compose Stack
Here’s a full Compose setup running Telegraf, InfluxDB 2.x, and Grafana, plus Mosquitto as the MQTT broker (for all your ESPHome/Zigbee2MQTT devices).
services: influxdb: image: influxdb:2.7 container_name: influxdb restart: unless-stopped ports: - "8086:8086" volumes: - influxdb_data:/var/lib/influxdb2 - influxdb_config:/etc/influxdb2 environment: DOCKER_INFLUXDB_INIT_MODE: setup DOCKER_INFLUXDB_INIT_USERNAME: admin DOCKER_INFLUXDB_INIT_PASSWORD: changeme_please DOCKER_INFLUXDB_INIT_ORG: homelab DOCKER_INFLUXDB_INIT_BUCKET: metrics DOCKER_INFLUXDB_INIT_ADMIN_TOKEN: my-super-secret-token
mosquitto: image: eclipse-mosquitto:2 container_name: mosquitto restart: unless-stopped ports: - "1883:1883" - "9001:9001" volumes: - mosquitto_data:/mosquitto/data - mosquitto_log:/mosquitto/log - ./mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
telegraf: image: telegraf:1.33 container_name: telegraf restart: unless-stopped depends_on: - influxdb - mosquitto volumes: - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro - /var/run/docker.sock:/var/run/docker.sock:ro user: "telegraf:993" # adjust GID to match your docker group
grafana: image: grafana/grafana:11.0.0 container_name: grafana restart: unless-stopped ports: - "3000:3000" volumes: - grafana_data:/var/lib/grafana environment: GF_SECURITY_ADMIN_PASSWORD: changeme_please depends_on: - influxdb
volumes: influxdb_data: influxdb_config: mosquitto_data: mosquitto_log: grafana_data:And the Telegraf config to go with it — scraping host metrics and subscribing to MQTT for your sensor data:
[agent] interval = "10s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "" hostname = "" omit_hostname = false
# Output to InfluxDB 2.x[[outputs.influxdb_v2]] urls = ["http://influxdb:8086"] token = "my-super-secret-token" organization = "homelab" bucket = "metrics"
# System metrics[[inputs.cpu]] percpu = true totalcpu = true collect_cpu_time = false report_active = false
[[inputs.mem]]
[[inputs.disk]] ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.net]]
[[inputs.system]]
# Docker container metrics[[inputs.docker]] endpoint = "unix:///var/run/docker.sock" gather_services = false source_tag = false container_state_include = ["created", "restarting", "running", "removing", "paused", "exited", "dead"] timeout = "5s" perdevice = false total = false docker_label_include = []
# MQTT input — for ESPHome, Zigbee2MQTT, Tasmota, whatever pushes to your broker[[inputs.mqtt_consumer]] servers = ["tcp://mosquitto:1883"] topics = [ "homeassistant/sensor/+/state", "zigbee2mqtt/+", "esphome/+/sensor/+/state", "tele/+/SENSOR", ] data_format = "json" # For Tasmota SENSOR payloads, you may need json_query = "StatusSNS" # Tune per your setup qos = 0 connection_timeout = "30s" persistent_session = false client_id = "telegraf"Boot it:
docker compose up -dInfluxDB UI will be at http://your-server:8086. Grafana at :3000. Log into Grafana, add InfluxDB as a data source (type: InfluxDB, query language: Flux, URL: http://influxdb:8086, org: homelab, token: your token, bucket: metrics).
Sample Queries: CPU and a Sensor
CPU usage over time (Flux):
from(bucket: "metrics") |> range(start: -1h) |> filter(fn: (r) => r._measurement == "cpu") |> filter(fn: (r) => r._field == "usage_percent") |> filter(fn: (r) => r.cpu == "cpu-total") |> aggregateWindow(every: 1m, fn: mean, createEmpty: false) |> yield(name: "mean")Temperature from an MQTT sensor (Flux):
from(bucket: "metrics") |> range(start: -6h) |> filter(fn: (r) => r._measurement == "mqtt_consumer") |> filter(fn: (r) => r._field == "temperature") |> filter(fn: (r) => r.topic =~ /esphome\/garage/) |> aggregateWindow(every: 5m, fn: mean, createEmpty: false) |> yield(name: "garage_temp")Paste these into Grafana’s query editor (in the “Script editor” mode for Flux), tweak the measurement name and field to match what Telegraf actually indexed, and you’ve got a dashboard.
TIG vs. Prometheus/LGTM: Honest Comparison
Prometheus (and its extended Grafana LGTM stack — Loki, Grafana, Tempo, Mimir) is genuinely excellent. If you’re running Kubernetes, a fleet of servers with stable IPs, or anything where targets register themselves, Prometheus is probably the better choice. The ecosystem is massive. The alerting via Alertmanager is robust. Service discovery for Kubernetes, Consul, and EC2 is native.
But here’s where TIG pulls ahead for the homelab/IoT scenario:
| TIG Stack | Prometheus Stack | |
|---|---|---|
| Collection model | Push (devices send data in) | Pull (server scrapes targets) |
| IoT/MQTT native | Yes (Telegraf MQTT plugin) | No (needs exporter) |
| Devices behind NAT | Works fine | Problematic |
| Ephemeral devices | No problem | Discovery config required |
| Time-series optimization | Built-in (InfluxDB) | Remote write to Thanos/Mimir |
| Downsampling/retention | Native (Tasks) | Via recording rules |
| Query language | Flux / InfluxQL / SQL (v3) | PromQL |
| Setup complexity | Low-medium | Medium-high |
If you’re already running Prometheus for your servers, you don’t have to pick. Many homelabbers run both: Prometheus for infra, TIG for sensors and home automation. They’re not mutually exclusive, and Grafana talks to both.
Common Gotchas
Cardinality explosions: Covered above, but worth repeating. Tagging with things like raw MAC addresses, full file paths, or any value with thousands of unique entries will crater InfluxDB’s memory usage. Keep tags low-cardinality. Use fields for the actual measurements.
Telegraf’s “kitchen sink” config: The default config file is 600+ lines of commented-out plugins. This is fantastic documentation and a terrible starting config. Delete everything you don’t use. You’ll thank yourself the first time you need to debug what’s being collected.
Grafana datasource versioning: When adding InfluxDB as a Grafana datasource, you get to choose the query language: Flux, InfluxQL, or (in newer versions) SQL. Make sure you pick the right one for your InfluxDB version and stick with it. Dashboards built with Flux queries don’t translate to InfluxQL and vice versa. Pick one, be consistent.
MQTT payload parsing: Telegraf’s mqtt_consumer plugin with data_format = "json" works great for well-structured JSON payloads. ESPHome is usually clean. Tasmota’s tele/+/SENSOR payloads are nested JSON and may need json_query to extract the right subtree. Zigbee2MQTT payloads vary by device. Budget time for this.
InfluxDB token management: The DOCKER_INFLUXDB_INIT_ADMIN_TOKEN env var only works on first init. If you need to rotate or create new tokens after setup, use the InfluxDB UI or CLI. Don’t lose your admin token — there’s no “forgot password” for programmatic access.
Closing Thoughts
If you’re running Mosquitto for your smart home, a few ESP32s reporting temperatures around the house, and maybe a Pi or two doing who-knows-what — you basically already have TIG-shaped problems. Your devices are pushing data, they don’t want to be scraped, and you need something to store 86,400 data points per sensor per day without eating your disk or requiring a PhD in PromQL.
TIG is that thing. It’s been around long enough to be stable, has a Docker Compose setup that fits on a single server, and won’t look at your garage temperature sensor like it’s a weird edge case.
Set it up. Point your ESPHome devices at MQTT. Watch the data flow. Inevitably spend three hours tweaking a Grafana dashboard theme at midnight. Your 2 AM self will appreciate having the metrics.