Argo Rollouts vs Flagger, Progressive delivery on Kubernetes (pick the right forklift)

Progressive delivery is the engineering equivalent of easing a forklift into a crowded warehouse aisle: you don’t slam the throttle and pray the boxes rearrange themselves politely. It’s deploying changes gradually, shifting a little traffic, checking metrics, and only then committing to the full switch, so your 2 AM pager stays mercifully quiet.

Below: a pragmatic, slightly snarky face-off between Argo Rollouts and Flagger. Both get the job done, but they drive different rigs. Code examples, metric queries, and real-world tradeoffs included.

What is progressive delivery? (quick hook)

Progressive delivery = staged, observable, reversible releases. Instead of “deploy-and-hope,” you can:

Route 5% of traffic to a new revision,
Watch latency/error metrics,
Increase to 25% if healthy,
Roll back automatically if a metric trips.

It’s like test driving a car on a neighborhood street before taking it onto the interstate, fewer witnesses to your mistakes.

Argo Rollouts: The Replacement Model

Control model: Rollout CRD replaces Deployment

Argo Rollouts is opinionated: you replace your Deployment with a Rollout CRD. That CRD becomes the authoritative controller: replicas, strategy (canary / blue-green), and traffic routing live on the Rollout. That “replacement” model gives tight control, but it means you must migrate resources to the Rollout type.

Pros: single source of truth, first-class rollout semantics, tight UI and CLI. Cons: you no longer operate a Deployment, which can confuse some GitOps flows if your pipeline expects Deployments.

Example YAML

Blue, green (preview + stable service):

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: bluegreen-demo
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bluegreen-demo
  template:
    metadata:
      labels:
        app: bluegreen-demo
    spec:
      containers:
      - name: web
        image: ghcr.io/stefanprodan/podinfo:6.0.1
        ports:
        - containerPort: 9898
  strategy:
    blueGreen:
      activeService: bluegreen-demo
      previewService: bluegreen-demo-preview
      autoPromotionEnabled: false   # manual promotion for verification

Canary with istio traffic shifting and analysis steps:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-demo
  namespace: default
spec:
  replicas: 4
  selector:
    matchLabels:
      app: canary-demo
  template:
    metadata:
      labels:
        app: canary-demo
    spec:
      containers:
      - name: web
        image: ghcr.io/stefanprodan/podinfo:6.0.1
        ports:
        - containerPort: 9898
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 1m}
      - analysis:
          templates:
          - templateName: success-rate
      - setWeight: 50
      - pause: {duration: 2m}
      - analysis:
          templates:
          - templateName: success-rate
      - setWeight: 100
      trafficRouting:
        istio:
          virtualService:
            name: canary-demo-vs
            routes:
            - name: http

Notes:

The Rollout owns traffic routing when you configure a traffic provider (Istio, NGINX integrations, Gateway API, ALB, SMI adapters, etc.).
The analysis step hooks into Argo’s AnalysisTemplate machinery (below).

Traffic shifting providers (Istio, NGINX, etc.)

Argo Rollouts supports traffic shifting through a set of providers: Istio, NGINX (via ingress/annotations or Ingress controllers), Gateway API (Envoy/Contour), SMI implementations, AWS ALB/ELBv2, and others via provider adapters. That makes it flexible whether you’re on a service mesh or using plain Ingress.

If your infra is mesh-first (Istio/Linkerd), Argo’s native Istio integration is very smooth. If you’re using Ingress controllers, check controller support (NGINX, ALB) for weight-based routing.

AnalysisTemplate & AnalysisRun

Argo Rollouts ships with AnalysisTemplate (templated checks) and AnalysisRun (runtime execution). You author reusable AnalysisTemplates that run Prometheus queries, webhooks, or external scripts.

Example AnalysisTemplate that checks request success-rate via Prometheus:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: default
spec:
  metrics:
  - name: request-success-rate
    interval: 1m
    count: 3
    successCondition: result >= 99
    failureCondition: result < 99
    provider:
      prometheus:
        address: http://prometheus.monitoring.svc:9090
        query: |
          sum(rate(istio_requests_total{destination_service=~"canary-demo.*",response_code!~"5.."}[5m]))
            / sum(rate(istio_requests_total{destination_service=~"canary-demo.*"}[5m])) * 100

Argo runs AnalysisRuns automatically as part of the Rollout analysis step. Failing AnalysisRuns trigger rollback behavior (or halt promotion) depending on your config.

Promotion & rollback behavior

Promotion: Argo Rollouts advances via defined steps (setWeight / pause / analysis). Promotion can be automatic (if analysis passes) or manual (if you disable auto-promotion).
Rollback: If an AnalysisRun fails, Rollout can automatically rollback to the previous stable ReplicaSet. The Analyses are first-class, so you get clear failure reasons. The Argo Rollouts controller adjusts ReplicaSets and your service routing atomically.

Argo Rollouts dashboard UI win

Argo Rollouts includes a slick kubectl argo rollouts dashboard web UI (and kubectl argo rollouts get rollout -n ns <name> CLI) that visualizes steps, weights, AnalysisRuns, and ReplicaSet history. It’s a strong operator UX, clicky, visual, and the kind of thing you want when a human is deciding if a release is safe.

Flagger: The Sidecar Model

Control model: Canary CRD watches Deployment

Flagger follows a different philosophy: it doesn’t replace your Deployment. Instead, Flagger watches your existing Deployment and orchestrates traffic shifting, analysis, and promotion externally. The Canary CRD points to a Deployment (targetRef) and the Flagger controller performs the progressive delivery.

Pros: non-invasive (keeps Deployment), works well with Flux/GitOps patterns where Deployments are expected. Cons: model split (Deployment vs Canary) can feel like a companion app that occasionally does weird things.

Example Canary CRD YAML

This is a typical Flagger Canary that uses Istio traffic shifting and an inline Prometheus metric:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  service:
    port: 9898
  provider:
    name: istio
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 5
    metrics:
    - name: request-success-rate
      thresholdRange:
      - 99
      - 100
      interval: 1m
      query: |
        sum(rate(istio_requests_total{destination_service=~"podinfo.*",response_code!~"5.."}[5m]))
          / sum(rate(istio_requests_total{destination_service=~"podinfo.*"}[5m])) * 100

Key points:

Flagger targets an existing Deployment via targetRef.
Flagger will create and shift traffic between stable and canary revisions using the configured provider (Istio, Linkerd, NGINX, Gateway API, ALB, etc.).
Inline metrics can be replaced by MetricTemplate references (below).

Traffic provider support (Istio, Linkerd, NGINX, Gateway API, ALB)

Flagger supports a wide set of providers: Istio, Linkerd, NGINX Ingress Controller, Gateway API (Contour/Envoy), AWS ALB (via aws-alb-ingress-controller integrations) and more. It speaks to the data plane via the provider adapters and updates virtual services/ingress to change weights.

This makes Flagger a good choice if you already have Deployments and want an external controller to manage canaries without switching resource types.

MetricTemplate in Flagger

Flagger lets you define MetricTemplate CRs that centralize Prometheus (or other) queries. Reuse them across canaries. A MetricTemplate decouples query details from Canary objects.

Example MetricTemplate:

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: request-success-rate
  namespace: flagger
spec:
  provider:
    type: prometheus
    address: http://prometheus.monitoring.svc:9090
  query: |
    sum(rate(istio_requests_total{destination_service=~"{{ .Namespace }}-podinfo.*",response_code!~"5.."}[{{ .Interval }}]))
      / sum(rate(istio_requests_total{destination_service=~"{{ .Namespace }}-podinfo.*"}[{{ .Interval }}])) * 100

Note: Flagger uses Go-template placeholders like {{ .Namespace }} and {{ .Interval }} to inject values at runtime (plus your own templateVariables if you define them). The MetricTemplate spec itself only holds the provider and query, the thresholdRange lives on the metric reference inside the Canary analysis.metrics block, not here.

Prometheus metric queries

Examples you’ll actually paste into templates:

Error rate (example for Istio):

sum(rate(istio_requests_total{destination_service=~"podinfo.*",response_code=~"5.."}[5m]))
  / sum(rate(istio_requests_total{destination_service=~"podinfo.*"}[5m])) * 100

Request success rate (inverse):

1 - sum(rate(istio_requests_total{destination_service=~"podinfo.*",response_code=~"5.."}[5m]))
  / sum(rate(istio_requests_total{destination_service=~"podinfo.*"}[5m]))

Latency P95 for service:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job="istio-mesh",destination_service=~"podinfo.*"}[5m])) by (le))

Always test queries in Prometheus or Grafana before wiring to analysis; a bad query = false positives at 3 AM.

Manual vs automatic promotion

Automatic: Flagger watches metrics at intervals, increments canary weights, and promotes to 100% once thresholds are satisfied (or a maxWeight progression completes).
Manual: You can disable automatic promotion (by adjusting analysis fields) and require human intervention. Manual promotion is performed by operations actions (e.g., annotating or patching the Canary resource or ingestion of a manual trigger, depending on your Flagger/operator workflow).

Practically: Flagger is built to run automatically in most setups (automatic promotion by default), but you can easily add gates (manual approvals in GitOps, or set thresholds so promotion rarely happens without human sign-off).

Rollback story

Flagger will automatically roll back when an analysis fails: it sets the canary weight back to 0, routes traffic back to the stable revision, and marks the Canary as failed. It also exposes status conditions for automation or alerting.

Rollback in Flagger is battle-tested, many teams use Flagger as an automated safety net in production clusters.

Key Differences Table/Matrix

Area	Argo Rollouts	Flagger
Control model	Replaces Deployment with Rollout CRD (single source of truth)	Watches existing Deployment with Canary CRD (non-invasive)
UI / UX	Strong dashboard + CLI plugin (Argo Rollouts UI), Argo wins	No native visual dashboard; relies on logs/Prom/Grafana
Integration	First-class with Argo CD, kubectl plugin	Built to play with Flux and conventional Deployments; works with Argo CD too
Metric sources	AnalysisTemplate supports Prometheus, webhooks, datadog via providers	MetricTemplate supports Prometheus (and other providers via adapters)
Traffic providers	Istio, NGINX, SMI, Gateway API, ALB, etc.	Istio, Linkerd, NGINX Ingress, Gateway API, AWS ALB, etc.
Complexity	Slightly higher because you change resource type; richer UI	Lower friction for teams that want to keep Deployments, easier GitOps compatibility
Best for	Teams that want tight control, visual ops, Argo CD shop	Teams that want minimal intrusion, Flux/Deployment-first workflows

Decision Framework

Do you actually need this, or is rolling update fine?

Ask the real questions:

Is this service customer-facing or critical? Are you fine if a bad release affects hundreds/thousands of users?
Can you afford “blast radius” risk during a deploy?
Do you already have Prometheus (or metrics) and alerting to validate releases?
Does your team have the operational muscle to interpret analysis failures and perform rollbacks?

If you’re running a small internal cron job, a simple rolling update is fine. If you’re deploying high-traffic frontends, APIs, payment or auth services, progressive delivery is worth the extra configuration. The rule of thumb: if a failed deploy costs more than a coffee or an apology email, add a progressive gate.

Which fits your team?

Use Argo Rollouts if:
You are already on Argo CD or like the idea of replacing Deployments with a richer primitive.
You want a UI for operators, visual rollout history, and integrated AnalysisTemplates.
You prefer explicit rollout objects that own traffic routing.
Use Flagger if:
You want to keep Deployments as-is (non-invasive).
You’re using Flux or Deployment-first GitOps models.
You need wide provider support and prefer metric templates attached to canaries rather than replacing a core resource type.

Other factors:

Mesh-first teams (Istio/Linkerd): both work well.
Ingress/ALB-heavy setups: Flagger historically had very strong AWS/ALB support; Argo Rollouts also supports ALB via providers, test both against your exact ingress controller.

Real commands (handy)

# Argo Rollouts CLI: inspect rollout
kubectl argo rollouts get rollout -n default canary-demo
# Open the dashboard
kubectl argo rollouts dashboard -n default

# Flagger: check canary status
kubectl -n test get canary podinfo
kubectl -n test describe canary podinfo

Conclusion (SumGuy voice, stop guessing, pick one)

If you’re running a fleet of critical services and want a polished cockpit with step-by-step controls, Argo Rollouts is the nicer driving experience. Swap in the Rollout CRD, hook up AnalysisTemplates, point it at Istio or Gateway API, and the dashboard makes you feel like a capable airline pilot. It’s less like a forklift and more like a precision crane.

If you like your Deployments and want a “set-it-and-forget-it” safety rail that watches your existing objects, Flagger is the faithful mechanic riding shotgun. It’s less intrusive, works with Flux, and will quietly nudge traffic back when metrics go sideways.

Pick Argo Rollouts if your team values UI, tight control, and Argo CD integration. Pick Flagger if you want minimal churn, Deployment-first workflows, and a controller that tucks neatly into an existing mesh or ingress setup.

Either way: add good Prometheus queries, test your analysis in a staging environment, and don’t let the fancy automation lull you into skipping canary observability. Progressive delivery is a seatbelt, not a guarantee, but it’s a damn good seatbelt for production.

Now go pick a forklift, back slowly, and keep the lights on.

Argo Rollouts vs Flagger Progressive Delivery

Argo Rollouts vs Flagger, Progressive delivery on Kubernetes (pick the right forklift)

What is progressive delivery? (quick hook)

Argo Rollouts: The Replacement Model

Control model: Rollout CRD replaces Deployment

Example YAML

Traffic shifting providers (Istio, NGINX, etc.)

AnalysisTemplate & AnalysisRun

Promotion & rollback behavior

Argo Rollouts dashboard UI win

Flagger: The Sidecar Model

Control model: Canary CRD watches Deployment

Example Canary CRD YAML

Traffic provider support (Istio, Linkerd, NGINX, Gateway API, ALB)

MetricTemplate in Flagger

Prometheus metric queries

Manual vs automatic promotion

Rollback story

Key Differences Table/Matrix

Decision Framework

Do you actually need this, or is rolling update fine?

Which fits your team?

Real commands (handy)

Conclusion (SumGuy voice, stop guessing, pick one)

Responses from around the web

Discussion

Related Posts

Argo Workflows vs Tekton

ArgoCD vs Flux: GitOps, When Your Git Repo Is the Source of Truth

Skopeo: Container Image Surgery Without a Daemon

Crossplane vs Terraform for Home Lab

Argo Rollouts vs Flagger Progressive Delivery

Argo Rollouts vs Flagger, Progressive delivery on Kubernetes (pick the right forklift)

What is progressive delivery? (quick hook)

Argo Rollouts: The Replacement Model

Control model: Rollout CRD replaces Deployment

Example YAML

Traffic shifting providers (Istio, NGINX, etc.)

AnalysisTemplate & AnalysisRun

Promotion & rollback behavior

Argo Rollouts dashboard UI win

Flagger: The Sidecar Model

Control model: Canary CRD watches Deployment

Example Canary CRD YAML

Traffic provider support (Istio, Linkerd, NGINX, Gateway API, ALB)

MetricTemplate in Flagger

Prometheus metric queries

Manual vs automatic promotion

Rollback story

Key Differences Table/Matrix

Decision Framework

Do you actually need this, or is rolling update fine?

Which fits your team?

Real commands (handy)

Conclusion (SumGuy voice, stop guessing, pick one)

Related Reading

Responses from around the web

Discussion

Related Posts

Argo Workflows vs Tekton

ArgoCD vs Flux: GitOps, When Your Git Repo Is the Source of Truth

Skopeo: Container Image Surgery Without a Daemon

Crossplane vs Terraform for Home Lab