Skip to content
Go back

k3s on Pi 5 Cluster: Real or Toy?

By SumGuy 10 min read
k3s on Pi 5 Cluster: Real or Toy?

You’ve Got Four Pis and a Dream

It’s 2026, and someone on your homelab Discord just posted a picture of five Raspberry Pi 5 boards stacked like a production rack. They’ve got k3s running. They’re talking about “distributed storage” and “ha control plane.” Your first thought: “Is this actually a Kubernetes cluster, or just a very expensive toy?”

Here’s the thing—it’s both, and neither. k3s on Pi 5 works. I’ve built them. They’re snappy for their size, dirt cheap compared to mini-PCs, and they’ll run actual workloads. But they’ll also eat your lunch in ways you won’t see coming until you’re at 2 AM wondering why your Vaultwarden pod keeps OOMing.

This is the real talk: what actually sticks on Pi clusters, what falls apart, and when you should stop Tetris-ing into smaller hardware and just buy the mini-PC.


Pi 5 Specs: The Good News and the Catch

Let’s get baseline expectations straight:

SpecPi 5 (8GB)What This Means
CPU2.4 GHz quad-core ARM64~40% Xeon E-2388G per-core perf (roughly)
RAM8 GB (default)Swap kills k3s; you need 8GB minimum
Storage (onboard)microSD (slow)Pure liability for k3s—you’ll thrash
Storage (NVMe HAT)PCIe 2.0 (250 MB/s writes)Night-and-day difference; basically mandatory
NetworkGigabit + PoE supportGood; no bottleneck at cluster scale
Power draw~5-8W (idle), ~15W (load)Keep going; still one outlet per Pi
Thermal headroom85°C throttle pointPassive heatsink + airflow = you’re fine at home

The catch: That quad-core is still ARM, and it’s still a single core handling your I/O on a Pi board. microSD is not the bottleneck—it’s the absolute show-stopper. If you skip the NVMe HAT, you’re not running k3s. You’re watching a slideshow.


Workloads That Actually Work

Here’s what you can realistically run on a 3-node k3s Pi cluster:

Lightweight & Stable

WorkloadCPU PressureRAM PressureStorage I/OVerdict
DNS/AdguardIdleVery lightMinimalRock solid ✓
Jellyfin (1 transcode)60–80% one core1.2GB per nodeModerate✓ Limited
Nextcloud (small)20–30%2GBHigh⚠ NVMe needed
Paperless OCR90% (batched)1.5GBVery high⚠ Single job
Vaultwarden5–10%400MBMinimalRock solid ✓
Prometheus scrape10–15%800MB (small)Moderate⚠ Retention limits
Loki (logs only)20%1GBVery high✗ Don’t try

Workloads you should not run:


The NVMe HAT Game-Changer

This deserves its own section because it will make or break your cluster.

Without NVMe HAT:

microSD → k3s datastore (etcd) thrashing
→ Any stateful workload (Postgres, Redis) → 50ms latencies
→ Logs pile up on root, you hit 90% free space panic

With NVMe HAT:

NVMe (Samsung 970 EVO or equivalent)
├─ /var/lib/rancher/k3s → datastore + node state (actual storage)
├─ /var/log → logs don't fill your root
└─ PVC backing (local-path-provisioner) → workload data

The HAT sits on the PCIe connector. Real throughput? ~250 MB/s writes, ~500 MB/s reads (PCIe 2.0 ×1 lane bottleneck, not the drive). That’s plenty for k3s—you’re not NAS-ing, you’re just keeping etcd happy.

Setup (one-liner, per node):

Terminal window
sudo mount /dev/nvme0n1p1 /mnt/nvme
sudo mv /var/lib/rancher /mnt/nvme/rancher
sudo ln -s /mnt/nvme/rancher /var/lib/rancher

Cost? £35–50 per Pi for a 256GB 970 EVO Plus. Do it.


Cluster Architecture That Doesn’t Fall Over

Master (k3s server) ← 1 node, HA control plane via embedded etcd
├─ etcd replicated to nodes 2 & 3
├─ API server
└─ Controller manager + Scheduler
Node 2 (k3s agent)
├─ kubelet
└─ kube-proxy
Node 3 (k3s agent)
├─ kubelet
└─ kube-proxy

Why three? You want quorum for etcd writes. Two nodes = no HA (split brain on one failure). Three = lose one, still alive. Five = three-node costs minus the money nonsense.

Storage architecture:

# local-path-provisioner (built into k3s)
StorageClass: local-path
├─ Uses node's local NVMe
├─ No replication (single-node failure = data loss for that PVC)
└─ Fine for: app configs, cache, temp logs
NOT for: databases you care about, backups
# If you need resilient storage:
# Option A: Longhorn (lightweight, ARM-native)
# - 3 replicas across 3 nodes = 1 node can die
# - Costs ~600MB RAM cluster-wide for metadata
# - Adds 15–20% I/O overhead (replication writes)
# Option B: just use Postgres on node 1, back it to R2/S3
# - Simpler, faster, but couple your data to one node

Real Performance: What You’ll Actually See

I built a three-node cluster (Pi 5, 8GB each, Samsung 970 EVO Plus per node, PoE). Here’s what it feels like:

Pod startup time:

Database queries (SQLite on NVMe):

Local machine: 5ms
Pi cluster query: 12–18ms
(Network + Pi I/O, within reason)

Image pulls (first time):

100MB image, gigabit network: ~5–8 seconds
(Pi's CPU + etcd contention: not instant)

Sustained workload (Paperless OCR):

One node @ 100% CPU for 45 sec per PDF
(10-page doc, balanced across cluster with Pod limits)
Other nodes: unaffected, 5–10% load

Memory pressure (hitting 7GB on a node):

kubelet starts evicting non-critical pods gracefully
→ 10–15 sec later, Pod is moved to another node
→ Very civilized; you won't notice

You won’t hit anything that breaks unless you’re stupid about scheduling (e.g., three Nextcloud pods on one node, no anti-affinity rules).


When You’re Actually Outgrowing Pi 5

Spot these warning signs:

Sign 1: I/O Walls

Prometheus scrape taking 15+ seconds
Database queries at 50ms+
Logs dropping on the floor (too much volume)

→ You’ve hit the NVMe/CPU bus limit. Adding more nodes doesn’t help (it’s per-node).

Sign 2: RAM Is Tight

Swap file is hot
kubelet evicting pods daily (not just under stress)
OOMKills on pods with normal settings

→ 8GB per Pi is the ceiling. You can’t fix this with more nodes.

Sign 3: CPU Is Actually Maxed

One workload (e.g., Paperless) running at 100% for hours
Affecting other pods' latency
Can't add replicas (CPU would go higher)

→ You need bigger cores. Pi is hitting the wall.

Sign 4: The Cluster Is Babysitting Your App

You're constantly tuning resource limits
Pod affinity rules to keep things apart
Disabling HA because any node is "critical"
Manual restarts when something wedges

→ You’ve outgrown “hobby cluster” into “production headache on toy hardware.”


Pi 5 vs. Mini-PC Inflection Point

When do you actually need to upgrade?

MetricPi 5 ClusterMini-PC UpgradeJump Reason
Total compute budget£150–200 (3 nodes)£600–900 (3× Intel N100)10–15W vs 5–8W per node, but 2–3× perf
Node failure impactPod eviction (manageable)Quicker recovery, less jitterFaster I/O = less time on eviction
Max concurrent workloads3–5 (light-medium)15–20+ (heavier)Real CPU + RAM per node
”I can leave it alone” hours2–3 days (needs monitoring)2–3 weeksHeadroom = fewer surprises
Storage backingNVMe (250MB/s)NVMe RAID 1 or SSD (1GB+/s)Order of magnitude faster
Cost per added node£45 (Pi + NVMe)£200+ (mini-PC + storage)Smaller relative cost increases

Honest inflection point: When you’re running more than one “real” app (Nextcloud + Postgres + Paperless, not just Adguard + Wiki), or you’re tired of monitoring it.

What to buy instead: Three used Intel NUC boxes (N100 or similar), 16GB RAM, 500GB NVMe each. £250 per box landed. Same form factor, 3–4× performance, zero Pi drama. Used enterprise Mini-PCs (Lenovo ThinkCentre M75q Gen 2, etc.) are even cheaper and tank workloads.


The Real Verdict

k3s on Pi 5 is NOT a toy. It’s a legitimate platform for home lab clusters, and I’ll run it again. You get:

It IS a constrained platform. You’ll hit walls that you can’t architect around:

Pick a Pi cluster if:

Buy a mini-PC instead if:

The Pi 5 isn’t holding you back from real Kubernetes. It’s holding you back from scale. That’s not nothing, but it’s also not a reason to skip it if the workload fits.

Your 2 AM self will decide which one was right.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
iperf3 + nload: Network Diagnosis

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts