You’ve Got Four Pis and a Dream
It’s 2026, and someone on your homelab Discord just posted a picture of five Raspberry Pi 5 boards stacked like a production rack. They’ve got k3s running. They’re talking about “distributed storage” and “ha control plane.” Your first thought: “Is this actually a Kubernetes cluster, or just a very expensive toy?”
Here’s the thing—it’s both, and neither. k3s on Pi 5 works. I’ve built them. They’re snappy for their size, dirt cheap compared to mini-PCs, and they’ll run actual workloads. But they’ll also eat your lunch in ways you won’t see coming until you’re at 2 AM wondering why your Vaultwarden pod keeps OOMing.
This is the real talk: what actually sticks on Pi clusters, what falls apart, and when you should stop Tetris-ing into smaller hardware and just buy the mini-PC.
Pi 5 Specs: The Good News and the Catch
Let’s get baseline expectations straight:
| Spec | Pi 5 (8GB) | What This Means |
|---|---|---|
| CPU | 2.4 GHz quad-core ARM64 | ~40% Xeon E-2388G per-core perf (roughly) |
| RAM | 8 GB (default) | Swap kills k3s; you need 8GB minimum |
| Storage (onboard) | microSD (slow) | Pure liability for k3s—you’ll thrash |
| Storage (NVMe HAT) | PCIe 2.0 (250 MB/s writes) | Night-and-day difference; basically mandatory |
| Network | Gigabit + PoE support | Good; no bottleneck at cluster scale |
| Power draw | ~5-8W (idle), ~15W (load) | Keep going; still one outlet per Pi |
| Thermal headroom | 85°C throttle point | Passive heatsink + airflow = you’re fine at home |
The catch: That quad-core is still ARM, and it’s still a single core handling your I/O on a Pi board. microSD is not the bottleneck—it’s the absolute show-stopper. If you skip the NVMe HAT, you’re not running k3s. You’re watching a slideshow.
Workloads That Actually Work
Here’s what you can realistically run on a 3-node k3s Pi cluster:
Lightweight & Stable
- Adguard Home — DNS blocking, 0 drama, uses ~200MB RAM per node
- Jellyfin (streaming front-end + one transcode worker) — Works for 1-2 concurrent users, 1080p only
- Pi-hole — Overkill for k3s but it works
- Home Assistant (small setup) — Runs fine with local DB; don’t send telemetry to HA Cloud
- Paperless-NGX — OCR is CPU-bound but manageable; set workers=1
- Wireguard — Perfectly suited; tunnels via k3s Ingress + persistent Pod
- Nextcloud (small-ish, with NVMe)** — 5–10 users, moderate sharing; uses objstore for large uploads
- Vaultwarden — 2-3 vaults with org sharing; nothing fancy
- Uptimekuma (dashboard, not full monitoring) — ~150MB cluster-wide
- Wiki.js — Pure frontend + SQLite backed to NVMe; reads are snappy
| Workload | CPU Pressure | RAM Pressure | Storage I/O | Verdict |
|---|---|---|---|---|
| DNS/Adguard | Idle | Very light | Minimal | Rock solid ✓ |
| Jellyfin (1 transcode) | 60–80% one core | 1.2GB per node | Moderate | ✓ Limited |
| Nextcloud (small) | 20–30% | 2GB | High | ⚠ NVMe needed |
| Paperless OCR | 90% (batched) | 1.5GB | Very high | ⚠ Single job |
| Vaultwarden | 5–10% | 400MB | Minimal | Rock solid ✓ |
| Prometheus scrape | 10–15% | 800MB (small) | Moderate | ⚠ Retention limits |
| Loki (logs only) | 20% | 1GB | Very high | ✗ Don’t try |
Workloads you should not run:
- Elasticsearch / OpenSearch — You need 2GB+ heap per replica; you’ve got 8GB total. Do the math.
- Anything needing Postgres with HDD (see I/O section below).
- CI/CD runners (GitHub Actions, Gitea) — Buildah + arm64 builds take forever; expect 15–45 min per job.
- Kafka / Message queues — Memory and GC pauses will destroy you.
- Any ML inference at scale — Training is a joke; inference on quantized small models (7B) is slow but viable.
The NVMe HAT Game-Changer
This deserves its own section because it will make or break your cluster.
Without NVMe HAT:
microSD → k3s datastore (etcd) thrashing → Any stateful workload (Postgres, Redis) → 50ms latencies → Logs pile up on root, you hit 90% free space panicWith NVMe HAT:
NVMe (Samsung 970 EVO or equivalent) ├─ /var/lib/rancher/k3s → datastore + node state (actual storage) ├─ /var/log → logs don't fill your root └─ PVC backing (local-path-provisioner) → workload dataThe HAT sits on the PCIe connector. Real throughput? ~250 MB/s writes, ~500 MB/s reads (PCIe 2.0 ×1 lane bottleneck, not the drive). That’s plenty for k3s—you’re not NAS-ing, you’re just keeping etcd happy.
Setup (one-liner, per node):
sudo mount /dev/nvme0n1p1 /mnt/nvmesudo mv /var/lib/rancher /mnt/nvme/ranchersudo ln -s /mnt/nvme/rancher /var/lib/rancherCost? £35–50 per Pi for a 256GB 970 EVO Plus. Do it.
Cluster Architecture That Doesn’t Fall Over
Three-Node HA (recommended minimum)
Master (k3s server) ← 1 node, HA control plane via embedded etcd├─ etcd replicated to nodes 2 & 3├─ API server└─ Controller manager + Scheduler
Node 2 (k3s agent)├─ kubelet└─ kube-proxy
Node 3 (k3s agent)├─ kubelet└─ kube-proxyWhy three? You want quorum for etcd writes. Two nodes = no HA (split brain on one failure). Three = lose one, still alive. Five = three-node costs minus the money nonsense.
Storage architecture:
# local-path-provisioner (built into k3s)StorageClass: local-path├─ Uses node's local NVMe├─ No replication (single-node failure = data loss for that PVC)└─ Fine for: app configs, cache, temp logs NOT for: databases you care about, backups
# If you need resilient storage:# Option A: Longhorn (lightweight, ARM-native)# - 3 replicas across 3 nodes = 1 node can die# - Costs ~600MB RAM cluster-wide for metadata# - Adds 15–20% I/O overhead (replication writes)# Option B: just use Postgres on node 1, back it to R2/S3# - Simpler, faster, but couple your data to one nodeReal Performance: What You’ll Actually See
I built a three-node cluster (Pi 5, 8GB each, Samsung 970 EVO Plus per node, PoE). Here’s what it feels like:
Pod startup time:
- Simple app (nginx, Vaultwarden): 2–3 seconds
- Larger container (Nextcloud): 8–12 seconds
- (This is fine; compare to VM boot)
Database queries (SQLite on NVMe):
Local machine: 5msPi cluster query: 12–18ms(Network + Pi I/O, within reason)Image pulls (first time):
100MB image, gigabit network: ~5–8 seconds(Pi's CPU + etcd contention: not instant)Sustained workload (Paperless OCR):
One node @ 100% CPU for 45 sec per PDF(10-page doc, balanced across cluster with Pod limits)Other nodes: unaffected, 5–10% loadMemory pressure (hitting 7GB on a node):
kubelet starts evicting non-critical pods gracefully→ 10–15 sec later, Pod is moved to another node→ Very civilized; you won't noticeYou won’t hit anything that breaks unless you’re stupid about scheduling (e.g., three Nextcloud pods on one node, no anti-affinity rules).
When You’re Actually Outgrowing Pi 5
Spot these warning signs:
Sign 1: I/O Walls
Prometheus scrape taking 15+ secondsDatabase queries at 50ms+Logs dropping on the floor (too much volume)→ You’ve hit the NVMe/CPU bus limit. Adding more nodes doesn’t help (it’s per-node).
Sign 2: RAM Is Tight
Swap file is hotkubelet evicting pods daily (not just under stress)OOMKills on pods with normal settings→ 8GB per Pi is the ceiling. You can’t fix this with more nodes.
Sign 3: CPU Is Actually Maxed
One workload (e.g., Paperless) running at 100% for hoursAffecting other pods' latencyCan't add replicas (CPU would go higher)→ You need bigger cores. Pi is hitting the wall.
Sign 4: The Cluster Is Babysitting Your App
You're constantly tuning resource limitsPod affinity rules to keep things apartDisabling HA because any node is "critical"Manual restarts when something wedges→ You’ve outgrown “hobby cluster” into “production headache on toy hardware.”
Pi 5 vs. Mini-PC Inflection Point
When do you actually need to upgrade?
| Metric | Pi 5 Cluster | Mini-PC Upgrade | Jump Reason |
|---|---|---|---|
| Total compute budget | £150–200 (3 nodes) | £600–900 (3× Intel N100) | 10–15W vs 5–8W per node, but 2–3× perf |
| Node failure impact | Pod eviction (manageable) | Quicker recovery, less jitter | Faster I/O = less time on eviction |
| Max concurrent workloads | 3–5 (light-medium) | 15–20+ (heavier) | Real CPU + RAM per node |
| ”I can leave it alone” hours | 2–3 days (needs monitoring) | 2–3 weeks | Headroom = fewer surprises |
| Storage backing | NVMe (250MB/s) | NVMe RAID 1 or SSD (1GB+/s) | Order of magnitude faster |
| Cost per added node | £45 (Pi + NVMe) | £200+ (mini-PC + storage) | Smaller relative cost increases |
Honest inflection point: When you’re running more than one “real” app (Nextcloud + Postgres + Paperless, not just Adguard + Wiki), or you’re tired of monitoring it.
What to buy instead: Three used Intel NUC boxes (N100 or similar), 16GB RAM, 500GB NVMe each. £250 per box landed. Same form factor, 3–4× performance, zero Pi drama. Used enterprise Mini-PCs (Lenovo ThinkCentre M75q Gen 2, etc.) are even cheaper and tank workloads.
The Real Verdict
k3s on Pi 5 is NOT a toy. It’s a legitimate platform for home lab clusters, and I’ll run it again. You get:
- Real Kubernetes
- HA control plane (3 nodes)
- Enough CPU for most hobbies
- 8GB RAM is workable (barely)
- PCIe NVMe makes it snappy
It IS a constrained platform. You’ll hit walls that you can’t architect around:
- Single quad-core means one CPU-bound task wedges everything
- I/O bottleneck (per node) means stateful apps are slow
- No room to add more resources to a single node (8GB is the max)
Pick a Pi cluster if:
- You’re running 3–5 lightweight services (Adguard, Vaultwarden, Wiki, Jellyfin front-end)
- You accept that Postgres is one node only, backed to S3
- You’re okay with 48-hour cluster-aware monitoring cycles
- Cost matters more than headroom
Buy a mini-PC instead if:
- You’re running Nextcloud + Postgres + monitoring + app-of-the-week
- You want to walk away for a month
- “Troubleshooting the cluster” isn’t a hobby you enjoy
- You have the budget (£700–900 for a solid 3-node setup)
The Pi 5 isn’t holding you back from real Kubernetes. It’s holding you back from scale. That’s not nothing, but it’s also not a reason to skip it if the workload fits.
Your 2 AM self will decide which one was right.