Multi-Arch Docker Builds With QEMU & buildx

Your Pi Pulled the Wrong Image Again

You spent an evening containerizing your app, pushed the image, pulled it on your Raspberry Pi 5, and got this:

exec /usr/local/bin/app: exec format error

Classic. Your laptop is x86_64, your Pi is arm64, and Docker handed it the wrong binary format. The container started just fine on your laptop, so obviously it’s the Pi’s fault. It’s not.

In 2026 multi-architecture builds aren’t optional anymore. Apple Silicon Macs run arm64. AWS Graviton instances are arm64 and they’re cheaper per vCPU than x86. Your homelab is probably a mix of a Pi, a Rockchip board, maybe an old Intel NUC. If you’re pushing a single-arch image, you’re writing off half your hardware and your CI bills in one shot.

The fix is a multi-arch image, one image reference that resolves to the right binary for whatever pulls it. Docker’s buildx plugin and QEMU make this surprisingly achievable. Let’s do it.

What a Multi-Arch Image Actually Is

Before we build anything, a quick anatomy lesson because the terminology trips people up.

A manifest list (also called an OCI image index) is a pointer. When you push myrepo/myapp:latest as a multi-arch image, the registry stores something like this:

$ docker manifest inspect myrepo/myapp:latest

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:abc123...",
      "platform": { "architecture": "amd64", "os": "linux" }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:def456...",
      "platform": { "architecture": "arm64", "variant": "v8", "os": "linux" }
    }
  ]
}

The tag latest doesn’t point at an image, it points at a list of images. When your Pi runs docker pull, it reads the manifest list, matches its own architecture, and pulls the right one. Your x86 machine does the same. Same tag, right binary, no exec format error.

This is what you’re building. Two (or more) actual images, stitched together under one tag.

Approach 1: QEMU Emulated Builds (Easy, Slow)

QEMU lets your x86 CPU pretend to be an ARM CPU using software translation. It’s like running an ARM interpreter, not fast, but it works and requires exactly zero extra hardware.

Install buildx and QEMU

buildx ships with Docker Desktop. If you’re on Docker Engine (Linux), check:

$ docker buildx version
github.com/docker/buildx v0.21.0 ...

If it’s missing, install the plugin:

$ sudo apt install docker-buildx-plugin

Then register QEMU binary formats with the kernel:

$ docker run --privileged --rm tonistiigi/binfmt --install all

This sets up /proc/sys/fs/binfmt_misc so the kernel knows to hand arm64 binaries to QEMU instead of panicking. You need to do this once per boot (or set it up as a systemd unit if you’re running a builder host).

Create a buildx builder

The default builder doesn’t support multi-arch. Create one that does:

$ docker buildx create --name multiarch --use
$ docker buildx inspect --bootstrap

The --use flag makes it the active builder. inspect --bootstrap starts the BuildKit container in the background and confirms available platforms. You should see linux/amd64, linux/arm64, linux/arm/v7, and a bunch more.

Build and push

$ docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag myrepo/myapp:latest \
  --push \
  .

That’s it. --push is required here, multi-arch builds can’t be loaded into your local Docker daemon (a manifest list has nowhere to land locally). The image goes straight to the registry.

BuildKit spins up two builds in parallel: one native amd64 pass, one arm64 pass running through QEMU. The arm64 pass is slow. Like, “make a coffee” slow for non-trivial images. A Rust compile that takes 90 seconds natively might take 12 minutes under QEMU. We’ll fix that in approach 2, but for small images or scripted languages, emulation is perfectly fine.

A Dockerfile that plays nice with multi-arch

Most Dockerfiles just work. A few things to watch:

# Use a multi-arch base image — alpine is fine here
FROM alpine:3.21

# This is fine — apk resolves the right arch automatically
RUN apk add --no-cache curl

# Copy your binary — but make sure you're building it for the right arch
# in your build stage (see below)
COPY app /usr/local/bin/app
RUN chmod +x /usr/local/bin/app

CMD ["app"]

If you’re compiling a Go binary:

FROM golang:1.24-alpine AS builder

WORKDIR /app
COPY . .

# GOARCH and GOOS are automatically set by BuildKit via build args
ARG TARGETOS
ARG TARGETARCH
RUN GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o app .

FROM alpine:3.21
COPY --from=builder /app/app /usr/local/bin/app
CMD ["app"]

TARGETOS and TARGETARCH are automatically injected by BuildKit when building for a specific platform. You don’t set them, they’re just there. Go’s cross-compilation is then trivial: set the env vars, build, done. No QEMU needed for the compile step itself, which is why Go multi-arch images build fast even with emulation.

For Rust or C/C++, cross-compilation is more involved (you need the right linker and sysroot). In those cases, native builders (approach 2) save your sanity.

Approach 2: Native Multi-Node Builders (Fast)

Instead of emulating ARM on your laptop, you point BuildKit at actual ARM hardware. Each platform builds natively, no emulation penalty. If you have an arm64 machine anywhere in your lab, a Pi, an Oracle Cloud free tier ARM VM, an AWS Graviton instance, you can do this.

Set up the builder nodes

On your x86 build host, create a builder that spans both machines:

# Add the local (amd64) node
$ docker buildx create \
  --name multiarch-native \
  --node native-amd64 \
  --platform linux/amd64 \
  --use

# Add the remote arm64 node (over SSH)
$ docker buildx create \
  --name multiarch-native \
  --append \
  --node native-arm64 \
  --platform linux/arm64 \
  --driver-opt env.BUILDKIT_STEP_LOG_MAX_SIZE=10485760 \
  ssh://user@your-arm-host

BuildKit connects to the remote node over SSH and runs the arm64 build there natively. Your amd64 build runs locally. Both run in parallel. Fast.

Inspect to confirm both nodes are up:

$ docker buildx inspect multiarch-native
Name:          multiarch-native
Driver:        docker-container

Nodes:
Name:      native-amd64
Endpoint:  unix:///var/run/docker.sock
Platforms: linux/amd64

Name:      native-arm64
Endpoint:  ssh://user@your-arm-host
Platforms: linux/arm64

The build command is identical:

$ docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag myrepo/myapp:latest \
  --push \
  .

BuildKit dispatches the amd64 layers to the local node and the arm64 layers to the SSH node. Results merge into a single manifest list at the registry. Your Rust compile that took 12 minutes under QEMU now takes 90 seconds because it’s actually running on ARM silicon.

CI Patterns

GitHub Actions

name: Build Multi-Arch Image

on:
  push:
    branches: [main]
  pull_request:

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

The setup-qemu-action handles the binfmt registration. setup-buildx-action creates the multi-arch builder. The cache lines are the important part, without them, every run re-downloads base layers and re-compiles from scratch.

GitLab CI / Forgejo Actions

The pattern is nearly identical, Forgejo Actions is GitHub Actions-compatible, so the YAML above works with minor tweaks for registry auth. GitLab’s equivalent:

build:
  image: docker:28
  services:
    - docker:28-dind
  variables:
    DOCKER_BUILDKIT: "1"
  before_script:
    - docker run --privileged --rm tonistiigi/binfmt --install all
    - docker buildx create --use
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
  script:
    - docker buildx build
        --platform linux/amd64,linux/arm64
        --tag "$CI_REGISTRY_IMAGE:latest"
        --push
        --cache-from type=registry,ref=$CI_REGISTRY_IMAGE:buildcache
        --cache-to type=registry,ref=$CI_REGISTRY_IMAGE:buildcache,mode=max
        .

Cache Strategy

Cache misses are the number-one reason multi-arch CI builds are slow. The two main strategies:

GitHub Actions cache (type=gha): BuildKit exports layer data to the GHA cache API. Fast to write, free within GHA limits, wiped if unused for 7 days. Good default for public repos.

Registry cache (type=registry): BuildKit pushes a cache manifest to your registry alongside the real image. Survives across runners, works in self-hosted CI, and you control the retention. The example above uses myrepo/myapp:buildcache as the cache tag, that’s a real (but metadata-only) tag in your registry.

# Manual build with registry cache
$ docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag myrepo/myapp:latest \
  --cache-from type=registry,ref=myrepo/myapp:buildcache \
  --cache-to type=registry,ref=myrepo/myapp:buildcache,mode=max \
  --push \
  .

mode=max exports all intermediate layers, not just the final image. More storage, much better cache hit rate on changed layers deep in the build. If you’re watching your registry storage, mode=min exports only the final layers, cheaper but fewer hits.

Multi-Arch Base Image Traps

Not every base image has an arm64 build. Check before you commit to a base.

Alpine vs glibc: Alpine uses musl libc. Most Go and static binaries don’t care. But if you’re using a C library or compiling against glibc-linked system packages, Alpine on arm64 can produce subtle runtime failures that only surface on the Pi. debian:slim is heavier but consistent. When in doubt, test on actual hardware.

Missing arm builds: Some images on Docker Hub only publish amd64. tonistiigi/xx, ghcr.io/linuxserver/*, and official language images (golang, python, node, rust) have solid multi-arch support. Third-party images are a grab bag, always check the Tags tab on Docker Hub and look for the architecture badges.

Platform-specific packages: Some apt or apk packages have different names or aren’t available on arm64. If your build fails only on arm64, check whether you’re hitting a package availability gap.

A quick sanity check before committing to a base:

$ docker manifest inspect python:3.12-slim | grep architecture

If you see both amd64 and arm64 in the output, you’re good.

Verify Your Manifest List

After pushing, confirm the manifest list is actually there:

$ docker buildx imagetools inspect myrepo/myapp:latest

Name:      docker.io/myrepo/myapp:latest
MediaType: application/vnd.oci.image.index.v1+json
Digest:    sha256:aaabbb...

Manifests:
  Name:      docker.io/myrepo/myapp:latest@sha256:abc123...
  MediaType: application/vnd.oci.image.manifest.v1+json
  Platform:  linux/amd64

  Name:      docker.io/myrepo/myapp:latest@sha256:def456...
  MediaType: application/vnd.oci.image.manifest.v1+json
  Platform:  linux/arm64

Two manifests under one tag. Your Pi will pull sha256:def456, your NUC will pull sha256:abc123. Both call it myrepo/myapp:latest. That’s the whole thing.

Which Approach Should You Use

QEMU emulation is the right call when:

You’re on a laptop or a single build host with no ARM hardware
Your image uses a scripted language (Python, Node, Ruby) or cross-compiles cleanly (Go, Rust with cross-rs)
Build time is acceptable: under 10-15 minutes for your image

Native multi-node is worth the setup when:

You’re compiling C, C++, or Rust with no cross-compilation setup
Your CI builds are hitting the 30-minute mark under emulation
You already have an arm64 machine sitting idle in your lab

For most homelab images, QEMU + registry cache gets you there without maintaining SSH access to a second build node. For anything compile-heavy going into production, the native approach pays for itself the first week.

Your Pi deserves better than exec format error. It took you 30 seconds to set up QEMU. No excuses.

Multi-Arch Docker Builds With QEMU & buildx

Your Pi Pulled the Wrong Image Again

What a Multi-Arch Image Actually Is

Approach 1: QEMU Emulated Builds (Easy, Slow)

Install buildx and QEMU

Create a buildx builder

Build and push

A Dockerfile that plays nice with multi-arch

Approach 2: Native Multi-Node Builders (Fast)

Set up the builder nodes

CI Patterns

GitHub Actions

GitLab CI / Forgejo Actions

Cache Strategy

Multi-Arch Base Image Traps

Verify Your Manifest List

Which Approach Should You Use

Responses from around the web

Discussion

Related Posts

Cosign Keyless: Sign Without Keys

Container Security: Scan and Sign Your Images Like You Mean It

Trivy + Cosign: Scan and Sign Your Images

Docker BuildKit: Stop Building Images the Slow Way

Multi-Arch Docker Builds With QEMU & buildx

Your Pi Pulled the Wrong Image Again

What a Multi-Arch Image Actually Is

Approach 1: QEMU Emulated Builds (Easy, Slow)

Install buildx and QEMU

Create a buildx builder

Build and push

A Dockerfile that plays nice with multi-arch

Approach 2: Native Multi-Node Builders (Fast)

Set up the builder nodes

CI Patterns

GitHub Actions

GitLab CI / Forgejo Actions

Cache Strategy

Multi-Arch Base Image Traps

Verify Your Manifest List

Which Approach Should You Use

Related Reading

Responses from around the web

Discussion

Related Posts

Cosign Keyless: Sign Without Keys

Container Security: Scan and Sign Your Images Like You Mean It

Trivy + Cosign: Scan and Sign Your Images

Docker BuildKit: Stop Building Images the Slow Way