Prometheus Drops Your Data After 15 Days. Surprise.
You’ve been running Prometheus for three months. Dashboards look good. Alerts fire when they should. Then someone asks: “Can you show me what CPU usage looked like last month?” You open Prometheus UI, query for data from 40 days ago, and get back nothing. Flat line. Silence. Your metrics are gone.
This isn’t a bug. It’s a design choice.
Prometheus retains about 15 days of data by default, and bumping retention to “store forever” on a single box tanks your performance around the 30-40 day mark. Prometheus stores everything in a local time-series database optimized for speed, not capacity. It’s a car, not a truck. Great for real-time dashboards and alerting. Terrible for capacity planning, year-over-year trend analysis, or compliance audits that ask for 12 months of data.
This is where long-term storage comes in. And if you’re running a home lab or small business, Grafana Mimir is probably the right move.
Why Prometheus Ain’t a Data Warehouse
Before we talk solutions, let’s understand the problem. Prometheus’s local storage is optimized for write speed and query latency. Every incoming metric gets compressed into blocks on disk, and blocks get compacted into larger ones over time. This design scales up vertically—more RAM, bigger SSD—until you hit the ceiling where a single machine can’t keep up. There’s no prize for running a monitoring system that’s so resource-heavy it becomes a SPOF (single point of failure) itself.
The solution isn’t to buy a bigger server. It’s to offload old data somewhere cheaper and keep Prometheus fast.
When Bumping Retention Is Actually Fine
Before you jump to Mimir, be honest: do you actually need long-term storage?
If your home lab has maybe 500 metrics total and you’re only concerned with the last 30 days of data, bumping Prometheus’s --storage.tsdb.retention.time flag to 30d might just work. You’ll need more disk space and a bit more RAM, but it’s simple. No extra services. No S3 bill. No debugging a distributed system at 2 AM.
The math is simple: roughly 1 KB per metric per day on average (varies wildly based on cardinality and scrape interval). 500 metrics × 30 days = ~15 GB of disk. Cheap. Easy.
But if you’re tracking thousands of metrics, or you need historical data for capacity planning, or compliance says “keep 12 months”, then a single Prometheus box becomes a money pit. This is where the long-term storage options split into camps.
The Players: Mimir vs Thanos vs VictoriaMetrics
You’ve got three main paths forward.
Thanos is the old guard. It works by sidecar: you run a Thanos sidecar container next to your Prometheus, and the sidecar uploads blocks to object storage (S3, GCS, whatever). Then you run separate query layer, store gateway, and compactor services to stitch it all together. It works, but it’s got more moving parts than a Swiss watch. Each component can fail independently. You’ll spend time debugging why your query layer can’t talk to the store gateway at 3 AM.
VictoriaMetrics is a separate beast entirely. It’s a time-series database built to be long-term storage from day one. Different architecture, different query language quirks, different operational model. That deserves its own article (slot 153 in the master plan), so we’ll skip it here.
Mimir is Grafana’s answer. It’s Thanos’s more organized cousin. Instead of sidecars and separate components, Mimir runs in a few flavors: monolithic mode (everything in one binary for labs), or fully distributed mode (horizontal scaling for production). It uses object storage as a backing layer—S3, MinIO, Google Cloud Storage, whatever—but the operational story is cleaner. Fewer moving parts than Thanos, more opinionated, better documentation.
For a home lab or small-to-mid company? Mimir wins on simplicity.
How Mimir Actually Works
Mimir is a long-term storage system that sits alongside your existing Prometheus setup. Your Prometheus stays exactly as it is. Mimir doesn’t replace it; it supplements it.
Here’s the flow:
- Prometheus scrapes metrics and stores them locally (15-day default retention).
- Prometheus is configured with a
remote_writeendpoint pointing to Mimir. - Every metric Prometheus sees gets sent to Mimir in real-time.
- Mimir accepts the data, compresses it, and stores it in object storage (S3, MinIO, etc.).
- Your Grafana dashboard points to Prometheus for recent data (fast, cached), and can query Mimir for older data via a separate Mimir datasource.
Or, to keep dashboards simple, you point Grafana only at Mimir, and Mimir’s query layer automatically falls back to Prometheus for the most recent 15 minutes (where Mimir hasn’t caught up yet). Either way works.
The architecture is this: Mimir runs as a horizontally scalable system. You can run it in monolithic mode (single binary with ingester, querier, compactor, and store all in one process) for a home lab, or split it out into separate deployments for production. It auto-scales. If one ingester crashes, another picks up the load. If you need more query throughput, you spin up more query nodes. This is how you avoid the ceiling Prometheus hits.
The Minimum Viable Mimir Setup
Let’s deploy Mimir for a home lab using Docker Compose. You’ll need:
- Mimir (obviously)
- A backing store (we’ll use MinIO, which is S3-compatible and runs on a single machine)
- Prometheus with remote_write configured
- Grafana
Here’s a minimal docker-compose.yml:
version: "3.8"
services: minio: image: minio/minio:latest ports: - "9000:9000" - "9001:9001" environment: MINIO_ROOT_USER: minioadmin MINIO_ROOT_PASSWORD: minioadmin command: server /data --console-address ":9001" volumes: - minio_data:/data
mimir: image: grafana/mimir:latest ports: - "9009:9009" volumes: - ./mimir-config.yaml:/etc/mimir/mimir.yaml command: -config.file=/etc/mimir/mimir.yaml depends_on: - minio
prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus command: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention.time=15d"
grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: GF_SECURITY_ADMIN_PASSWORD: admin volumes: - grafana_data:/var/lib/grafana
volumes: minio_data: prometheus_data: grafana_data:Now the Mimir config (mimir-config.yaml):
multitenancy_enabled: false
ingester: lifecycler: ring: kvstore: store: inmemory
blocks_storage: tsdb: dir: /tmp/mimir-tsdb bucket_store: sync_dir: /tmp/mimir-sync backend: s3 s3: endpoint: minio:9000 access_key_id: minioadmin secret_access_key: minioadmin insecure: true bucket_name: mimir-blocks
compactor: data_dir: /tmp/mimir-compactor sharding_ring: kvstore: store: inmemory
store_gateway: sharding_ring: kvstore: store: inmemory
query_scheduler: max_cache_freshness_per_tenant: 10m
limits: max_global_samples_per_user: 10000000And wire up Prometheus’s remote_write to Mimir. In your prometheus.yml:
global: scrape_interval: 15s
remote_write: - url: http://mimir:9009/api/prom/push queue_config: capacity: 10000 max_retries: 3 min_backoff: 100ms max_backoff: 100ms
scrape_configs: - job_name: prometheus static_configs: - targets: ['localhost:9090']That’s it. Prometheus now ships every metric to Mimir. Mimir buffers, compresses, and stages it to MinIO. You can query 15 days of Prometheus data plus everything Mimir’s seen since you turned it on.
Querying Across Time
In Grafana, you add a Mimir datasource pointing to http://mimir:9009/prometheus. It speaks the same PromQL as Prometheus, so your dashboards don’t change. When you query a time range Prometheus no longer has, Grafana transparently queries Mimir instead.
Want to compare CPU usage across the last 12 months? Query Mimir. Want to see last hour? Prometheus is faster (local SSD). Grafana handles both without you thinking about it.
The Cost Question
This is the part nobody talks about openly: running long-term storage costs money.
If you’re using S3 in AWS, you’re paying for object storage (~$0.023/GB/month in the US), plus API calls (list/put are cheap, get is cheaper per call). A year of metrics for a moderately busy system (10K samples/sec) might be 2-3 TB, so you’re looking at $50-75/month in storage alone, plus data transfer if Grafana queries it often. That adds up.
MinIO in a home lab? Disk costs nothing extra (you own the drives), electricity is pennies, and complexity is low.
If you’re at a company and metrics are a compliance requirement, you’ll swallow the cost. S3 for a year of metrics is cheaper than a second full-time ops person.
But if you’re a home lab and just curious about historical trends, bumping Prometheus retention to 30d and calling it a day might be the smarter move.
When You Actually Need Long-Term Storage
You need Mimir (or Thanos, or VictoriaMetrics) when:
- Capacity planning: You need to see how resource usage scales month-to-month, year-to-year. You can’t make budget decisions with 15 days of data.
- Compliance: Your org says “keep 12 months of audit data.” A single Prometheus box can’t store it cheaply. Mimir + S3 gives you compliance on a realistic budget.
- Incident investigation: It’s 2 AM, production is on fire, and someone asks “when did this start?” Historical data from 8 weeks ago answers the question.
- Performance tuning: You’re optimizing alerting thresholds and you need to see how metrics behaved over seasons (summer spike vs winter baseline).
- SLO tracking: You’re calculating availability and you need 90-day windows of data. Prometheus can’t hold it.
If none of these apply, Prometheus’s 15-day default is fine. Running Mimir adds operational overhead. You maintain MinIO or S3, tune compactors, debug query latency across two systems. Only add complexity if the benefit is real.
For a small home lab, Prometheus + a bigger disk usually wins. For anything larger, Mimir + object storage becomes cheaper and simpler than scaling Prometheus vertically.
Pick the tool that matches your scale. And if 15 days is enough? Sleep better at night with a simpler system.