Skip to content
Go back

TIG: Telegraf + InfluxDB + Grafana

By SumGuy 12 min read
TIG: Telegraf + InfluxDB + Grafana

Prometheus Is Great, Until You Have 200 ESP32s

You set up Prometheus. You scraped a few services. You felt very DevOps about it. Then you added your smart home stuff — a dozen ESP32 temperature sensors, a handful of Zigbee devices reporting via Zigbee2MQTT, a Pi monitoring your UPS. And suddenly Prometheus starts looking at you the way your dog looks at a vacuum cleaner: confused, vaguely hostile.

Here’s the thing about Prometheus: it’s a pull-based system. It goes out and scrapes your targets on a schedule. That model is phenomenal for containerized microservices that register themselves, maintain a /metrics endpoint, and live on a stable IP. It’s less great when your “targets” are:

That’s where the TIG stack comes in. Telegraf (the collector), InfluxDB (the time-series database), and Grafana (the dashboard). Push-based, sensor-friendly, and honest about what it is. No service discovery drama. No scrape interval mismatches. Just data flowing in, stored efficiently, and displayed on a beautiful dashboard you built at 2 AM while your spouse questioned your life choices.


Pull vs Push: Why Push Wins Here

Pull-based collection (Prometheus’s model) requires your targets to be reachable, stable, and willing to serve HTTP. For server infrastructure this is totally fine — your containers stay up, your IPs are predictable, and Prometheus is excellent at this job.

But IoT is a different animal. Consider:

Push wins here, full stop. The TIG stack is built around this model. Telegraf sits on your server, subscribes to MQTT topics, listens for incoming data, and shovels everything into InfluxDB. Your devices don’t need to know or care what’s storing their data.


Telegraf: The Agent That Does Everything

Telegraf is InfluxData’s collection agent and it is, frankly, absurdly capable. Over 300 input plugins, dozens of output plugins, and a config format that won’t make you want to quit infrastructure forever.

The core idea is simple: input plugins collect data, processor plugins transform it (optional), and output plugins write it somewhere. For the TIG stack, the output is almost always InfluxDB. The inputs are where it gets fun.

Some inputs you’ll actually use in a homelab:

One gotcha: the default telegraf.conf has nearly every plugin commented in with examples. It’s a great reference, but it’s also a loaded footgun. If you enable inputs carelessly, you’ll flood InfluxDB with metrics you’ll never look at, bloat your disk, and then spend an evening figuring out why your cardinality is out of control. Start minimal. Add what you actually need.


InfluxDB: Time-Series Done Right (Mostly)

InfluxDB is purpose-built for time-series data. Timestamps are first-class citizens. Queries assume you’re asking about things over time. Storage is optimized for sequential writes from many sources. For sensor data and metrics, it’s genuinely the right tool.

A quick vocabulary lesson:

The v1/v2/v3 situation in 2026:

This is where you need to pay attention, because InfluxDB’s version history is a bit of a saga.

For a homelab in 2026, the pragmatic choice is InfluxDB 2.x (2.7.x is stable). You get a solid UI, good Grafana integration, and Flux still works even if it’s being deprecated. If you’re starting fresh and want to be forward-compatible, keep an eye on 3.x OSS — but it’s not quite “plug it in on a Sunday afternoon” stable yet.

Retention policies and downsampling: InfluxDB handles data retention natively. You can keep raw data for 30 days and downsampled (hourly averages) data for a year. On InfluxDB 2.x, this is done via Tasks — scheduled Flux queries that aggregate and write to a different bucket. Indispensable if you’re pushing data every 10 seconds and don’t want to buy more NVMe.

Cardinality gotcha: This one bites people hard. In InfluxDB, cardinality = the number of unique tag value combinations. If you use sensor_id as a tag and you have 500 sensors, that’s fine. If you use something like a full UUID or a raw timestamp as a tag, you’ve just created millions of unique series and InfluxDB will eat your RAM like it’s a buffet. Tags should be low-cardinality. Fields can hold high-cardinality data. Tattoo that on your forearm before you start tagging things.


Grafana: The Dashboard Layer You Already Know

If you’ve been in the homelab space for more than six months, you’ve probably already used Grafana. It connects to almost everything and makes charts that look like you know what you’re doing.

For InfluxDB, you’ll add it as a data source in Grafana’s settings. With InfluxDB 2.x, you configure it with the bucket, org, and an API token. Pick the query language (Flux or InfluxQL, both supported via the datasource settings). Flux gives you more power; InfluxQL feels more familiar if you’ve used SQL.

Alerting: Grafana’s built-in alerting is solid for homelab use. Set a threshold on a temperature sensor, get notified via Telegram or email when the server closet hits 40°C. No PagerDuty subscription required.


The Working Compose Stack

Here’s a full Compose setup running Telegraf, InfluxDB 2.x, and Grafana, plus Mosquitto as the MQTT broker (for all your ESPHome/Zigbee2MQTT devices).

docker-compose.yml
services:
influxdb:
image: influxdb:2.7
container_name: influxdb
restart: unless-stopped
ports:
- "8086:8086"
volumes:
- influxdb_data:/var/lib/influxdb2
- influxdb_config:/etc/influxdb2
environment:
DOCKER_INFLUXDB_INIT_MODE: setup
DOCKER_INFLUXDB_INIT_USERNAME: admin
DOCKER_INFLUXDB_INIT_PASSWORD: changeme_please
DOCKER_INFLUXDB_INIT_ORG: homelab
DOCKER_INFLUXDB_INIT_BUCKET: metrics
DOCKER_INFLUXDB_INIT_ADMIN_TOKEN: my-super-secret-token
mosquitto:
image: eclipse-mosquitto:2
container_name: mosquitto
restart: unless-stopped
ports:
- "1883:1883"
- "9001:9001"
volumes:
- mosquitto_data:/mosquitto/data
- mosquitto_log:/mosquitto/log
- ./mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
telegraf:
image: telegraf:1.33
container_name: telegraf
restart: unless-stopped
depends_on:
- influxdb
- mosquitto
volumes:
- ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
user: "telegraf:993" # adjust GID to match your docker group
grafana:
image: grafana/grafana:11.0.0
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_PASSWORD: changeme_please
depends_on:
- influxdb
volumes:
influxdb_data:
influxdb_config:
mosquitto_data:
mosquitto_log:
grafana_data:

And the Telegraf config to go with it — scraping host metrics and subscribing to MQTT for your sensor data:

telegraf.conf
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
# Output to InfluxDB 2.x
[[outputs.influxdb_v2]]
urls = ["http://influxdb:8086"]
token = "my-super-secret-token"
organization = "homelab"
bucket = "metrics"
# System metrics
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.mem]]
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.net]]
[[inputs.system]]
# Docker container metrics
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
gather_services = false
source_tag = false
container_state_include = ["created", "restarting", "running", "removing", "paused", "exited", "dead"]
timeout = "5s"
perdevice = false
total = false
docker_label_include = []
# MQTT input — for ESPHome, Zigbee2MQTT, Tasmota, whatever pushes to your broker
[[inputs.mqtt_consumer]]
servers = ["tcp://mosquitto:1883"]
topics = [
"homeassistant/sensor/+/state",
"zigbee2mqtt/+",
"esphome/+/sensor/+/state",
"tele/+/SENSOR",
]
data_format = "json"
# For Tasmota SENSOR payloads, you may need json_query = "StatusSNS"
# Tune per your setup
qos = 0
connection_timeout = "30s"
persistent_session = false
client_id = "telegraf"

Boot it:

Terminal window
docker compose up -d

InfluxDB UI will be at http://your-server:8086. Grafana at :3000. Log into Grafana, add InfluxDB as a data source (type: InfluxDB, query language: Flux, URL: http://influxdb:8086, org: homelab, token: your token, bucket: metrics).


Sample Queries: CPU and a Sensor

CPU usage over time (Flux):

from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu")
|> filter(fn: (r) => r._field == "usage_percent")
|> filter(fn: (r) => r.cpu == "cpu-total")
|> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
|> yield(name: "mean")

Temperature from an MQTT sensor (Flux):

from(bucket: "metrics")
|> range(start: -6h)
|> filter(fn: (r) => r._measurement == "mqtt_consumer")
|> filter(fn: (r) => r._field == "temperature")
|> filter(fn: (r) => r.topic =~ /esphome\/garage/)
|> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
|> yield(name: "garage_temp")

Paste these into Grafana’s query editor (in the “Script editor” mode for Flux), tweak the measurement name and field to match what Telegraf actually indexed, and you’ve got a dashboard.


TIG vs. Prometheus/LGTM: Honest Comparison

Prometheus (and its extended Grafana LGTM stack — Loki, Grafana, Tempo, Mimir) is genuinely excellent. If you’re running Kubernetes, a fleet of servers with stable IPs, or anything where targets register themselves, Prometheus is probably the better choice. The ecosystem is massive. The alerting via Alertmanager is robust. Service discovery for Kubernetes, Consul, and EC2 is native.

But here’s where TIG pulls ahead for the homelab/IoT scenario:

TIG StackPrometheus Stack
Collection modelPush (devices send data in)Pull (server scrapes targets)
IoT/MQTT nativeYes (Telegraf MQTT plugin)No (needs exporter)
Devices behind NATWorks fineProblematic
Ephemeral devicesNo problemDiscovery config required
Time-series optimizationBuilt-in (InfluxDB)Remote write to Thanos/Mimir
Downsampling/retentionNative (Tasks)Via recording rules
Query languageFlux / InfluxQL / SQL (v3)PromQL
Setup complexityLow-mediumMedium-high

If you’re already running Prometheus for your servers, you don’t have to pick. Many homelabbers run both: Prometheus for infra, TIG for sensors and home automation. They’re not mutually exclusive, and Grafana talks to both.


Common Gotchas

Cardinality explosions: Covered above, but worth repeating. Tagging with things like raw MAC addresses, full file paths, or any value with thousands of unique entries will crater InfluxDB’s memory usage. Keep tags low-cardinality. Use fields for the actual measurements.

Telegraf’s “kitchen sink” config: The default config file is 600+ lines of commented-out plugins. This is fantastic documentation and a terrible starting config. Delete everything you don’t use. You’ll thank yourself the first time you need to debug what’s being collected.

Grafana datasource versioning: When adding InfluxDB as a Grafana datasource, you get to choose the query language: Flux, InfluxQL, or (in newer versions) SQL. Make sure you pick the right one for your InfluxDB version and stick with it. Dashboards built with Flux queries don’t translate to InfluxQL and vice versa. Pick one, be consistent.

MQTT payload parsing: Telegraf’s mqtt_consumer plugin with data_format = "json" works great for well-structured JSON payloads. ESPHome is usually clean. Tasmota’s tele/+/SENSOR payloads are nested JSON and may need json_query to extract the right subtree. Zigbee2MQTT payloads vary by device. Budget time for this.

InfluxDB token management: The DOCKER_INFLUXDB_INIT_ADMIN_TOKEN env var only works on first init. If you need to rotate or create new tokens after setup, use the InfluxDB UI or CLI. Don’t lose your admin token — there’s no “forgot password” for programmatic access.


Closing Thoughts

If you’re running Mosquitto for your smart home, a few ESP32s reporting temperatures around the house, and maybe a Pi or two doing who-knows-what — you basically already have TIG-shaped problems. Your devices are pushing data, they don’t want to be scraped, and you need something to store 86,400 data points per sensor per day without eating your disk or requiring a PhD in PromQL.

TIG is that thing. It’s been around long enough to be stable, has a Docker Compose setup that fits on a single server, and won’t look at your garage temperature sensor like it’s a weird edge case.

Set it up. Point your ESPHome devices at MQTT. Watch the data flow. Inevitably spend three hours tweaking a Grafana dashboard theme at midnight. Your 2 AM self will appreciate having the metrics.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
iperf3 + nload: Network Diagnosis

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts