k3s on Pi 5 Cluster: Real or Toy?

You’ve Got Four Pis and a Dream

It’s 2026, and someone on your homelab Discord just posted a picture of five Raspberry Pi 5 boards stacked like a production rack. They’ve got k3s running. They’re talking about “distributed storage” and “ha control plane.” Your first thought: “Is this actually a Kubernetes cluster, or just a very expensive toy?”

It’s both, and neither. k3s on Pi 5 works. I’ve built them. They’re snappy for their size, dirt cheap compared to mini-PCs, and they’ll run actual workloads. But they’ll also eat your lunch in ways you won’t see coming until you’re at 2 AM wondering why your Vaultwarden pod keeps OOMing.

This is the real talk: what actually sticks on Pi clusters, what falls apart, and when you should stop Tetris-ing into smaller hardware and just buy the mini-PC.

Pi 5 Specs: The Good News and the Catch

Let’s get baseline expectations straight:

Spec	Pi 5 (8GB)	What This Means
CPU	2.4 GHz quad-core ARM64	~40% Xeon E-2388G per-core perf (roughly)
RAM	8 GB (default)	Swap kills k3s; you need 8GB minimum
Storage (onboard)	microSD (slow)	Pure liability for k3s, you’ll thrash
Storage (NVMe HAT)	PCIe 2.0 (250 MB/s writes)	Night-and-day difference; basically mandatory
Network	Gigabit + PoE support	Good; no bottleneck at cluster scale
Power draw	~5-8W (idle), ~15W (load)	Keep going; still one outlet per Pi
Thermal headroom	85°C throttle point	Passive heatsink + airflow = you’re fine at home

The catch: That quad-core is still ARM, and it’s still a single core handling your I/O on a Pi board. microSD is not the bottleneck. It’s the absolute show-stopper. If you skip the NVMe HAT, you’re not running k3s. You’re watching a slideshow.

Workloads That Actually Work

Here’s what you can realistically run on a 3-node k3s Pi cluster:

Lightweight & Stable

Adguard Home: DNS blocking, 0 drama, uses ~200MB RAM per node
Jellyfin (streaming front-end + one transcode worker): Works for 1-2 concurrent users, 1080p only
Pi-hole: Overkill for k3s but it works
Home Assistant (small setup): Runs fine with local DB; don’t send telemetry to HA Cloud
Paperless-NGX: OCR is CPU-bound but manageable; set workers=1
Wireguard: Perfectly suited; tunnels via k3s Ingress + persistent Pod
Nextcloud (small-ish, with NVMe)**: 5 to 10 users, moderate sharing; uses objstore for large uploads
Vaultwarden: 2-3 vaults with org sharing; nothing fancy
Uptimekuma (dashboard, not full monitoring): ~150MB cluster-wide
Wiki.js: Pure frontend + SQLite backed to NVMe; reads are snappy

Workload	CPU Pressure	RAM Pressure	Storage I/O	Verdict
DNS/Adguard	Idle	Very light	Minimal	Rock solid ✓
Jellyfin (1 transcode)	60 to 80% one core	1.2GB per node	Moderate	✓ Limited
Nextcloud (small)	20 to 30%	2GB	High	⚠ NVMe needed
Paperless OCR	90% (batched)	1.5GB	Very high	⚠ Single job
Vaultwarden	5 to 10%	400MB	Minimal	Rock solid ✓
Prometheus scrape	10 to 15%	800MB (small)	Moderate	⚠ Retention limits
Loki (logs only)	20%	1GB	Very high	✗ Don’t try

Workloads you should not run:

Elasticsearch / OpenSearch: You need 2GB+ heap per replica; you’ve got 8GB total. Do the math.
Anything needing Postgres with HDD (see I/O section below).
CI/CD runners (GitHub Actions, Gitea): Buildah + arm64 builds take forever; expect 15 to 45 min per job.
Kafka / Message queues: Memory and GC pauses will destroy you.
Any ML inference at scale: Training is a joke; inference on small quantized models (a 4B like Gemma 4 or Qwen 3.6) is slow but viable.

The NVMe HAT Game-Changer

This deserves its own section because it will make or break your cluster.

Without NVMe HAT:

microSD → k3s datastore (etcd) thrashing
         → Any stateful workload (Postgres, Redis) → 50ms latencies
         → Logs pile up on root, you hit 90% free space panic

With NVMe HAT:

NVMe (Samsung 970 EVO or equivalent)
  ├─ /var/lib/rancher/k3s → datastore + node state (actual storage)
  ├─ /var/log → logs don't fill your root
  └─ PVC backing (local-path-provisioner) → workload data

The HAT sits on the PCIe connector. Real throughput? ~250 MB/s writes, ~500 MB/s reads (PCIe 2.0 ×1 lane bottleneck, not the drive). That’s plenty for k3s, you’re not NAS-ing, you’re just keeping etcd happy.

Setup (one-liner, per node):

sudo mount /dev/nvme0n1p1 /mnt/nvme
sudo mv /var/lib/rancher /mnt/nvme/rancher
sudo ln -s /mnt/nvme/rancher /var/lib/rancher

Cost? £35 to 50 per Pi for a 256GB 970 EVO Plus. Do it.

Cluster Architecture That Doesn’t Fall Over

Three-Node HA (recommended minimum)

Node 1 (k3s server)     ← control plane + embedded etcd
├─ etcd (quorum member 1 of 3)
├─ API server
└─ Controller manager + Scheduler

Node 2 (k3s server)     ← control plane + embedded etcd
├─ etcd (quorum member 2 of 3)
├─ API server
└─ Controller manager + Scheduler

Node 3 (k3s server)     ← control plane + embedded etcd
├─ etcd (quorum member 3 of 3)
├─ API server
└─ Controller manager + Scheduler

Why three? For embedded-etcd HA, all three need to be k3s servers (not agents). Etcd quorum lives on the server nodes. You want an odd number for quorum: two nodes = no HA (split brain on one failure), three = lose one and still write, five = three-node resilience minus the extra money. (You can still join pure agent nodes on top for more worker capacity. They just don’t vote.)

Storage architecture:

# local-path-provisioner (built into k3s)
StorageClass: local-path
├─ Uses node's local NVMe
├─ No replication (single-node failure = data loss for that PVC)
└─ Fine for: app configs, cache, temp logs
   NOT for: databases you care about, backups

# If you need resilient storage:
# Option A: Longhorn (lightweight, ARM-native)
#   - 3 replicas across 3 nodes = 1 node can die
#   - Costs ~600MB RAM cluster-wide for metadata
#   - Adds 15–20% I/O overhead (replication writes)
# Option B: just use Postgres on node 1, back it to R2/S3
#   - Simpler, faster, but couple your data to one node

Real Performance: What You’ll Actually See

I built a three-node cluster (Pi 5, 8GB each, Samsung 970 EVO Plus per node, PoE). Here’s what it feels like:

Pod startup time:

Simple app (nginx, Vaultwarden): 2 to 3 seconds
Larger container (Nextcloud): 8 to 12 seconds
(This is fine; compare to VM boot)

Database queries (SQLite on NVMe):

Local machine: 5ms
Pi cluster query: 12–18ms
(Network + Pi I/O, within reason)

Image pulls (first time):

100MB image, gigabit network: ~5–8 seconds
(Pi's CPU + etcd contention: not instant)

Sustained workload (Paperless OCR):

One node @ 100% CPU for 45 sec per PDF
(10-page doc, balanced across cluster with Pod limits)
Other nodes: unaffected, 5–10% load

Memory pressure (hitting 7GB on a node):

kubelet starts evicting non-critical pods gracefully
→ 10–15 sec later, Pod is moved to another node
→ Very civilized; you won't notice

You won’t hit anything that breaks unless you’re stupid about scheduling (e.g., three Nextcloud pods on one node, no anti-affinity rules).

When You’re Actually Outgrowing Pi 5

Spot these warning signs:

Sign 1: I/O Walls

Prometheus scrape taking 15+ seconds
Database queries at 50ms+
Logs dropping on the floor (too much volume)

→ You’ve hit the NVMe/CPU bus limit. Adding more nodes doesn’t help (it’s per-node).

Sign 2: RAM Is Tight

Swap file is hot
kubelet evicting pods daily (not just under stress)
OOMKills on pods with normal settings

→ 8GB per Pi is the ceiling. You can’t fix this with more nodes.

Sign 3: CPU Is Actually Maxed

One workload (e.g., Paperless) running at 100% for hours
Affecting other pods' latency
Can't add replicas (CPU would go higher)

→ You need bigger cores. Pi is hitting the wall.

Sign 4: The Cluster Is Babysitting Your App

You're constantly tuning resource limits
Pod affinity rules to keep things apart
Disabling HA because any node is "critical"
Manual restarts when something wedges

→ You’ve outgrown “hobby cluster” into “production headache on toy hardware.”

Pi 5 vs. Mini-PC Inflection Point

When do you actually need to upgrade?

Metric	Pi 5 Cluster	Mini-PC Upgrade	Jump Reason
Total compute budget	£150 to 200 (3 nodes)	£600 to 900 (3× Intel N100)	10 to 15W vs 5 to 8W per node, but 2 to 3× perf
Node failure impact	Pod eviction (manageable)	Quicker recovery, less jitter	Faster I/O = less time on eviction
Max concurrent workloads	3 to 5 (light-medium)	15 to 20+ (heavier)	Real CPU + RAM per node
”I can leave it alone” hours	2 to 3 days (needs monitoring)	2 to 3 weeks	Headroom = fewer surprises
Storage backing	NVMe (250MB/s)	NVMe RAID 1 or SSD (1GB+/s)	Order of magnitude faster
Cost per added node	£45 (Pi + NVMe)	£200+ (mini-PC + storage)	Smaller relative cost increases

Honest inflection point: When you’re running more than one “real” app (Nextcloud + Postgres + Paperless, not just Adguard + Wiki), or you’re tired of monitoring it.

What to buy instead: Three used Intel NUC boxes (N100 or similar), 16GB RAM, 500GB NVMe each. £250 per box landed. Same form factor, 3 to 4× performance, zero Pi drama. Used enterprise Mini-PCs (Lenovo ThinkCentre M75q Gen 2, etc.) are even cheaper and tank workloads.

The Real Verdict

k3s on Pi 5 is NOT a toy. It’s a legitimate platform for home lab clusters, and I’ll run it again. You get:

Real Kubernetes
HA control plane (3 nodes)
Enough CPU for most hobbies
8GB RAM is workable (barely)
PCIe NVMe makes it snappy

It IS a constrained platform. You’ll hit walls that you can’t architect around:

Single quad-core means one CPU-bound task wedges everything
I/O bottleneck (per node) means stateful apps are slow
No room to add more resources to a single node (8GB is the max)

Pick a Pi cluster if:

You’re running 3 to 5 lightweight services (Adguard, Vaultwarden, Wiki, Jellyfin front-end)
You accept that Postgres is one node only, backed to S3
You’re okay with 48-hour cluster-aware monitoring cycles
Cost matters more than headroom

Buy a mini-PC instead if:

You’re running Nextcloud + Postgres + monitoring + app-of-the-week
You want to walk away for a month
“Troubleshooting the cluster” isn’t a hobby you enjoy
You have the budget (£700 to 900 for a solid 3-node setup)

The Pi 5 isn’t holding you back from real Kubernetes. It’s holding you back from scale. That’s not nothing, but it’s also not a reason to skip it if the workload fits.

Your 2 AM self will decide which one was right.

k3s on Pi 5 Cluster: Real or Toy?

You’ve Got Four Pis and a Dream

Pi 5 Specs: The Good News and the Catch

Workloads That Actually Work

Lightweight & Stable

The NVMe HAT Game-Changer

Cluster Architecture That Doesn’t Fall Over

Three-Node HA (recommended minimum)

Real Performance: What You’ll Actually See

When You’re Actually Outgrowing Pi 5

Sign 1: I/O Walls

Sign 2: RAM Is Tight

Sign 3: CPU Is Actually Maxed

Sign 4: The Cluster Is Babysitting Your App

Pi 5 vs. Mini-PC Inflection Point

The Real Verdict

Responses from around the web

Discussion

Related Posts

Headlamp: K8s UI Without the License Drama

K9s vs Lens vs Headlamp: Cluster UIs

Krew: Kubectl Plugins You'll Actually Use

KEDA: Event-Driven Autoscaling Self-Hosted

k3s on Pi 5 Cluster: Real or Toy?

You’ve Got Four Pis and a Dream

Pi 5 Specs: The Good News and the Catch

Workloads That Actually Work

Lightweight & Stable

The NVMe HAT Game-Changer

Cluster Architecture That Doesn’t Fall Over

Three-Node HA (recommended minimum)

Real Performance: What You’ll Actually See

When You’re Actually Outgrowing Pi 5

Sign 1: I/O Walls

Sign 2: RAM Is Tight

Sign 3: CPU Is Actually Maxed

Sign 4: The Cluster Is Babysitting Your App

Pi 5 vs. Mini-PC Inflection Point

The Real Verdict

Related Reading

Responses from around the web

Discussion

Related Posts

Headlamp: K8s UI Without the License Drama

K9s vs Lens vs Headlamp: Cluster UIs

Krew: Kubectl Plugins You'll Actually Use

KEDA: Event-Driven Autoscaling Self-Hosted