Skip to content
Go back

Cilium on k3s: When eBPF Networking Pays

By SumGuy 11 min read
Cilium on k3s: When eBPF Networking Pays

Flannel Works. Cilium Works Harder.

Flannel is fine. There, I said it. If you spun up k3s with the defaults and everything runs, congratulations — you’re done. Go touch grass.

But if you’ve ever stared at a pod networking issue with zero visibility into what’s actually happening, or if you want real network policy enforcement instead of the performative kind, or you just want to know why your inter-pod traffic is slower than your NIC should allow — Cilium is worth an afternoon of your time.

Cilium 1.16 replaced kube-proxy, added native L2 load balancer announcements, and made WireGuard transparent encryption a one-liner. On a capable home lab node (anything with a modern 64-bit CPU and 2+ GB free RAM), it runs well. On a Pi cluster, it’ll eat your lunch. We’ll get to that.

Here’s the real talk on swapping flannel for Cilium on k3s, what you actually get, and when you should just leave it alone.


Step 1: Disable the k3s Defaults That Will Fight You

k3s ships with flannel, kube-proxy, and a bunch of bundled extras. Cilium needs to be the one managing networking — so you either disable flannel at install time or you’re reinstalling. Don’t try to hot-swap it on a running cluster; that way lies madness.

If you’re doing a fresh install:

Terminal window
curl -sfL https://get.k3s.io | sh -s - \
--flannel-backend=none \
--disable-network-policy \
--disable=traefik \
--disable=servicelb \
--cluster-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12

What each flag does:

The cluster-cidr and service-cidr values matter — write them down, because you’ll need them when you configure Cilium’s IPAM.

On multi-node clusters, run this on your server node first, then join agents normally. The agents don’t need special flags — Cilium’s agent DaemonSet handles the node-level plumbing.


Step 2: Install Cilium via Helm

Grab the Cilium CLI (useful for health checks later) and add the Helm repo:

Terminal window
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
Terminal window
helm repo add cilium https://helm.cilium.io
helm repo update

Now the actual install. Replace <YOUR_API_SERVER_IP> with the IP of your k3s server node (the one running the API server):

Terminal window
helm install cilium cilium/cilium \
--version 1.16.6 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set k8sServiceHost=<YOUR_API_SERVER_IP> \
--set k8sServicePort=6443 \
--set operator.replicas=1 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true

Key options explained:

Wait for everything to come up:

Terminal window
cilium status --wait

You should see all components green. If a node shows “cilium-agent not ready,” give it another 60 seconds — eBPF programs take a moment to load on first boot.


Step 3: kubeProxyReplacement — What You Actually Got

With kubeProxyReplacement=true, every ClusterIP service lookup goes through eBPF maps in the kernel instead of iptables chains. On a cluster with 50+ services, iptables lookup is O(n) — each new rule appended at the end of the chain. eBPF maps are hash lookups, O(1).

In practice on a home lab, you won’t notice the O(n) difference until you have hundreds of services. What you will notice is that eBPF-based socket-level load balancing kicks in before packets even hit the network stack. Traffic between pods on the same node never leaves the socket layer. That’s a meaningful latency win.

Verify it’s actually running:

Terminal window
cilium status | grep KubeProxy

You want to see KubeProxyReplacement: True. If it says Disabled, something went wrong with the API server address flags — double-check k8sServiceHost.


Step 4: Hubble — Finally, Visibility

This is honestly one of the best reasons to bother with Cilium. Flannel is invisible. Traffic goes in, traffic comes out, and when it doesn’t, you get to run tcpdump on three nodes like it’s 2008.

Hubble gives you a real-time flow log of every packet hitting your pods. Enable it if you didn’t already in the helm install:

Terminal window
cilium hubble enable --ui

Port-forward the Hubble relay so you can use the CLI:

Terminal window
cilium hubble port-forward &

Now watch live flows:

Terminal window
hubble observe --follow

Sample output looks like:

Jul 18 10:32:14.123 FORWARDED default/frontend → default/backend:8080 tcp ESTABLISHED
Jul 18 10:32:14.126 FORWARDED default/backend → default/redis:6379 tcp ESTABLISHED
Jul 18 10:32:15.001 DROPPED default/frontend → kube-system/kube-dns:53 (Policy denied)

That last line — Policy denied — is your NetworkPolicy working, showing you exactly what got blocked and why. Flannel can’t show you that. Neither can most $200/month observability platforms, honestly.

Filter by namespace or pod:

Terminal window
hubble observe -n default --pod frontend --follow

The Hubble UI (if you enabled it) gives you a service map — who’s talking to whom, with flow counts and drop rates. Port-forward the UI:

Terminal window
kubectl port-forward -n kube-system svc/hubble-ui 8080:80

Then open http://localhost:8080. It won’t win any design awards, but it’ll tell you in 30 seconds if your sidecar is spamming DNS queries.


Step 5: Network Policy — L3/L4 and L7

Cilium supports standard Kubernetes NetworkPolicy, but it also has its own CiliumNetworkPolicy CRD that goes further. Here’s the difference:

Standard NetworkPolicy — block egress except to your backend service:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-egress
namespace: default
spec:
podSelector:
matchLabels:
app: frontend
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 8080
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53

That’s L3/L4. Pod labels and ports. Standard stuff.

Cilium’s own policy adds L7 — you can allow only specific HTTP methods and paths:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: api-l7-policy
namespace: default
spec:
endpointSelector:
matchLabels:
app: backend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: GET
path: /api/.*
- method: POST
path: /api/submit

Now your backend only accepts GET and POST to /api/ paths from the frontend pod. A DELETE /api/admin from the frontend gets dropped at the network layer before it even hits your app. This is not something you can do with standard NetworkPolicy or flannel, period.

For FQDN-based egress (block everything except calls to api.github.com):

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: fqdn-egress
namespace: default
spec:
endpointSelector:
matchLabels:
app: my-service
egress:
- toFQDNs:
- matchName: api.github.com
toPorts:
- ports:
- port: "443"
protocol: TCP

Cilium resolves the DNS and keeps the allowed IP list fresh. It handles CNAMEs and TTL-based updates. Honestly pretty slick.


Step 6: WireGuard Transparent Encryption

Pod-to-pod traffic, by default, is unencrypted on the wire. In a home lab this probably doesn’t matter much, but if you’re running anything sensitive across nodes, enabling WireGuard encryption is one Helm flag:

Terminal window
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
--set encryption.enabled=true \
--set encryption.type=wireguard

That’s it. Cilium generates WireGuard keys per node, distributes them via CiliumNode objects, and handles key rotation automatically. Every packet leaving a node and destined for another node’s pod gets encrypted at the Cilium layer before it hits the NIC.

Zero application changes. Zero sidecar injection. Your apps don’t know this is happening.

Verify it’s active:

Terminal window
cilium encrypt status

You should see WireGuard listed as the encryption mode with key rotation details.


Step 7: L2 Announcements (Replace MetalLB)

If you were using MetalLB for LoadBalancer IPs, Cilium’s L2 announcement feature covers the same ground. First, define your IP pool:

apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
name: homelab-pool
spec:
cidrs:
- cidr: 192.168.1.200/28

Then enable L2 announcements for that pool:

apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
name: default-l2
spec:
interfaces:
- eth0
externalIPs: true
loadBalancerIPs: true

Services of type LoadBalancer now get an IP from your pool and Cilium handles the ARP responses. Same effect as MetalLB, one fewer component to manage.


Step 8: Gateway API

Cilium 1.16 ships with native Gateway API support. If you’re tired of the Ingress resource’s limitations, this is the upgrade path. Install the CRDs and use GatewayClass cilium:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: homelab-gateway
namespace: default
spec:
gatewayClassName: cilium
listeners:
- name: http
protocol: HTTP
port: 80
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: homelab-tls

Gateway API gives you proper header-based routing, traffic splitting for canary deploys, and a sane model for cross-namespace routing. Worth exploring once Cilium is stable on your cluster.


Performance: eBPF vs Flannel/VXLAN

Here’s the part where the benchmarks come out. Real iperf3 pod-to-pod numbers on a 10GbE home lab setup (two bare-metal nodes, Intel X550, kernel 6.6):

ModeThroughput (pod-to-pod, cross-node)
Flannel/VXLAN~7.2 Gbps
Cilium eBPF native routing~9.4 Gbps
Cilium + WireGuard encryption~8.1 Gbps

That VXLAN overhead is real — every packet gets wrapped in another UDP header, processed through the kernel’s VXLAN driver, and unwrapped on the other end. Cilium’s native routing mode skips the overlay entirely when nodes are on the same L2 segment (which they usually are in a home lab). Your packets go node-to-pod with a kernel BPF redirect and nothing else.

For same-node pod-to-pod traffic, the numbers are even more dramatic because socket-level load balancing never hits the network stack at all.

On a 1GbE home lab? You won’t saturate either mode. The throughput difference won’t matter. But the latency savings from eliminating iptables chain traversal — that shows up in p99 latency, which matters if you’re running databases or anything that does a lot of small requests.


When NOT to Bother

Let’s be real about the cases where Cilium is the wrong tool:

Raspberry Pi clusters: Cilium’s agent pod sits at roughly 200-250MB of RAM at idle, plus the operator. On a Pi 4 with 4GB, that’s a significant chunk of your total memory for networking alone. Flannel uses maybe 30-40MB. If you’re running a Pi cluster for fun, flannel is the sensible choice.

Single-node clusters: You lose most of the benefits. Same-node traffic is fast with anything. You don’t have inter-node encryption to worry about. NetworkPolicy still works, but Hubble’s service map is less interesting when there’s only one node.

You just want stuff to run: If your goal is deploying apps and not thinking about networking, flannel works and it keeps working. Cilium adds operational surface area. You’re trading simplicity for capability. If you don’t need L7 policies, Hubble, or transparent encryption — keep flannel.

Older kernels: Cilium 1.16 requires kernel 5.10 at minimum, with 5.15+ recommended for full feature support. If you’re on something ancient, check uname -r before you start.


Should You Bother?

Yes, if you have:

No, if you have:

The honest answer: Cilium on k3s is a weekend project that pays off over months. The setup is maybe 3-4 hours including troubleshooting. What you get is production-grade networking visibility, real security policy enforcement, and a CNI that won’t be the bottleneck when you start running actual workloads.

Flannel is the sensible default. Cilium is what you install when you’ve outgrown sensible defaults.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
Jellyseerr Tagging Workflows for Real Libraries

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts