Argo Rollouts vs Flagger — Progressive delivery on Kubernetes (pick the right forklift)
Progressive delivery is the engineering equivalent of easing a forklift into a crowded warehouse aisle: you don’t slam the throttle and pray the boxes rearrange themselves politely. It’s deploying changes gradually, shifting a little traffic, checking metrics, and only then committing to the full switch — so your 2 AM pager stays mercifully quiet.
Below: a pragmatic, slightly snarky face-off between Argo Rollouts and Flagger. Both get the job done, but they drive different rigs. Code examples, metric queries, and real-world tradeoffs included.
What is progressive delivery? (quick hook)
Progressive delivery = staged, observable, reversible releases. Instead of “deploy-and-hope,” you can:
- Route 5% of traffic to a new revision,
- Watch latency/error metrics,
- Increase to 25% if healthy,
- Roll back automatically if a metric trips.
It’s like test driving a car on a neighborhood street before taking it onto the interstate — fewer witnesses to your mistakes.
Argo Rollouts: The Replacement Model
Control model: Rollout CRD replaces Deployment
Argo Rollouts is opinionated: you replace your Deployment with a Rollout CRD. That CRD becomes the authoritative controller: replicas, strategy (canary / blue-green), and traffic routing live on the Rollout. That “replacement” model gives tight control — but it means you must migrate resources to the Rollout type.
Pros: single source of truth, first-class rollout semantics, tight UI and CLI. Cons: you no longer operate a Deployment, which can confuse some GitOps flows if your pipeline expects Deployments.
Example YAML
Blue–green (preview + stable service):
apiVersion: argoproj.io/v1alpha1kind: Rolloutmetadata: name: bluegreen-demo namespace: defaultspec: replicas: 3 selector: matchLabels: app: bluegreen-demo template: metadata: labels: app: bluegreen-demo spec: containers: - name: web image: ghcr.io/stefanprodan/podinfo:6.0.1 ports: - containerPort: 9898 strategy: blueGreen: activeService: bluegreen-demo previewService: bluegreen-demo-preview autoPromotionEnabled: false # manual promotion for verificationCanary with istio traffic shifting and analysis steps:
apiVersion: argoproj.io/v1alpha1kind: Rolloutmetadata: name: canary-demo namespace: defaultspec: replicas: 4 selector: matchLabels: app: canary-demo template: metadata: labels: app: canary-demo spec: containers: - name: web image: ghcr.io/stefanprodan/podinfo:6.0.1 ports: - containerPort: 9898 strategy: canary: steps: - setWeight: 20 - pause: {duration: 1m} - analysis: templates: - templateName: success-rate - setWeight: 50 - pause: {duration: 2m} - analysis: templates: - templateName: success-rate - setWeight: 100 trafficRouting: istio: virtualService: name: canary-demo-vs routes: - name: httpNotes:
- The Rollout owns traffic routing when you configure a traffic provider (Istio, NGINX integrations, Gateway API, ALB, SMI adapters, etc.).
- The analysis step hooks into Argo’s AnalysisTemplate machinery (below).
Traffic shifting providers (Istio, NGINX, etc.)
Argo Rollouts supports traffic shifting through a set of providers: Istio, NGINX (via ingress/annotations or Ingress controllers), Gateway API (Envoy/Contour), SMI implementations, AWS ALB/ELBv2, and others via provider adapters. That makes it flexible whether you’re on a service mesh or using plain Ingress.
If your infra is mesh-first (Istio/Linkerd), Argo’s native Istio integration is very smooth. If you’re using Ingress controllers, check controller support (NGINX, ALB) for weight-based routing.
AnalysisTemplate & AnalysisRun
Argo Rollouts ships with AnalysisTemplate (templated checks) and AnalysisRun (runtime execution). You author reusable AnalysisTemplates that run Prometheus queries, webhooks, or external scripts.
Example AnalysisTemplate that checks request success-rate via Prometheus:
apiVersion: argoproj.io/v1alpha1kind: AnalysisTemplatemetadata: name: success-rate namespace: defaultspec: metrics: - name: request-success-rate interval: 1m count: 3 successCondition: result >= 99 failureCondition: result < 99 provider: prometheus: address: http://prometheus.monitoring.svc:9090 query: | sum(rate(istio_requests_total{destination_service=~"canary-demo.*",response_code!~"5.."}[5m])) / sum(rate(istio_requests_total{destination_service=~"canary-demo.*"}[5m])) * 100Argo runs AnalysisRuns automatically as part of the Rollout analysis step. Failing AnalysisRuns trigger rollback behavior (or halt promotion) depending on your config.
Promotion & rollback behavior
- Promotion: Argo Rollouts advances via defined steps (setWeight / pause / analysis). Promotion can be automatic (if analysis passes) or manual (if you disable auto-promotion).
- Rollback: If an AnalysisRun fails, Rollout can automatically rollback to the previous stable ReplicaSet. The Analyses are first-class, so you get clear failure reasons. The Argo Rollouts controller adjusts ReplicaSets and your service routing atomically.
Argo Rollouts dashboard UI win
Argo Rollouts includes a slick kubectl argo rollouts dashboard web UI (and kubectl argo rollouts get rollout -n ns <name> CLI) that visualizes steps, weights, AnalysisRuns, and ReplicaSet history. It’s a strong operator UX — clicky, visual, and the kind of thing you want when a human is deciding if a release is safe.
Flagger: The Sidecar Model
Control model: Canary CRD watches Deployment
Flagger follows a different philosophy: it doesn’t replace your Deployment. Instead, Flagger watches your existing Deployment and orchestrates traffic shifting, analysis, and promotion externally. The Canary CRD points to a Deployment (targetRef) and the Flagger controller performs the progressive delivery.
Pros: non-invasive (keeps Deployment), works well with Flux/GitOps patterns where Deployments are expected. Cons: model split (Deployment vs Canary) can feel like a companion app that occasionally does weird things.
Example Canary CRD YAML
This is a typical Flagger Canary that uses Istio traffic shifting and an inline Prometheus metric:
apiVersion: flagger.app/v1beta1kind: Canarymetadata: name: podinfo namespace: testspec: targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo service: port: 9898 provider: name: istio analysis: interval: 1m threshold: 10 maxWeight: 50 stepWeight: 5 metrics: - name: request-success-rate thresholdRange: - 99 - 100 interval: 1m query: | sum(rate(istio_requests_total{destination_service=~"podinfo.*",response_code!~"5.."}[5m])) / sum(rate(istio_requests_total{destination_service=~"podinfo.*"}[5m])) * 100Key points:
- Flagger targets an existing Deployment via targetRef.
- Flagger will create and shift traffic between stable and canary revisions using the configured provider (Istio, Linkerd, NGINX, Gateway API, ALB, etc.).
- Inline metrics can be replaced by MetricTemplate references (below).
Traffic provider support (Istio, Linkerd, NGINX, Gateway API, ALB)
Flagger supports a wide set of providers: Istio, Linkerd, NGINX Ingress Controller, Gateway API (Contour/Envoy), AWS ALB (via aws-alb-ingress-controller integrations) and more. It speaks to the data plane via the provider adapters and updates virtual services/ingress to change weights.
This makes Flagger a good choice if you already have Deployments and want an external controller to manage canaries without switching resource types.
MetricTemplate in Flagger
Flagger lets you define MetricTemplate CRs that centralize Prometheus (or other) queries. Reuse them across canaries. A MetricTemplate decouples query details from Canary objects.
Example MetricTemplate:
apiVersion: flagger.app/v1beta1kind: MetricTemplatemetadata: name: request-success-rate namespace: flaggerspec: provider: type: prometheus address: http://prometheus.monitoring.svc:9090 query: | sum(rate(istio_requests_total{destination_service=~"{{ $namespace }}-podinfo.*",response_code!~"5.."}[{{ $interval }}])) / sum(rate(istio_requests_total{destination_service=~"{{ $namespace }}-podinfo.*"}[{{ $interval }}])) * 100 thresholdRange: - 99 - 100Note: Flagger uses templating placeholders to inject namespace/interval values. The MetricTemplate is then referenced inside the Canary analysis.metrics block.
Prometheus metric queries
Examples you’ll actually paste into templates:
Error rate (example for Istio):
sum(rate(istio_requests_total{destination_service=~"podinfo.*",response_code=~"5.."}[5m])) / sum(rate(istio_requests_total{destination_service=~"podinfo.*"}[5m])) * 100Request success rate (inverse):
1 - sum(rate(istio_requests_total{destination_service=~"podinfo.*",response_code=~"5.."}[5m])) / sum(rate(istio_requests_total{destination_service=~"podinfo.*"}[5m]))Latency P95 for service:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job="istio-mesh",destination_service=~"podinfo.*"}[5m])) by (le))Always test queries in Prometheus or Grafana before wiring to analysis; a bad query = false positives at 3 AM.
Manual vs automatic promotion
- Automatic: Flagger watches metrics at intervals, increments canary weights, and promotes to 100% once thresholds are satisfied (or a maxWeight progression completes).
- Manual: You can disable automatic promotion (by adjusting analysis fields) and require human intervention. Manual promotion is performed by operations actions (e.g., annotating or patching the Canary resource or ingestion of a manual trigger, depending on your Flagger/operator workflow).
Practically: Flagger is built to run automatically in most setups (automatic promotion by default), but you can easily add gates (manual approvals in GitOps, or set thresholds so promotion rarely happens without human sign-off).
Rollback story
Flagger will automatically roll back when an analysis fails: it sets the canary weight back to 0, routes traffic back to the stable revision, and marks the Canary as failed. It also exposes status conditions for automation or alerting.
Rollback in Flagger is battle-tested — many teams use Flagger as an automated safety net in production clusters.
Key Differences Table/Matrix
| Area | Argo Rollouts | Flagger |
|---|---|---|
| Control model | Replaces Deployment with Rollout CRD (single source of truth) | Watches existing Deployment with Canary CRD (non-invasive) |
| UI / UX | Strong dashboard + CLI plugin (Argo Rollouts UI) — Argo wins | No native visual dashboard; relies on logs/Prom/Grafana |
| Integration | First-class with Argo CD, kubectl plugin | Built to play with Flux and conventional Deployments; works with Argo CD too |
| Metric sources | AnalysisTemplate supports Prometheus, webhooks, datadog via providers | MetricTemplate supports Prometheus (and other providers via adapters) |
| Traffic providers | Istio, NGINX, SMI, Gateway API, ALB, etc. | Istio, Linkerd, NGINX Ingress, Gateway API, AWS ALB, etc. |
| Complexity | Slightly higher because you change resource type; richer UI | Lower friction for teams that want to keep Deployments, easier GitOps compatibility |
| Best for | Teams that want tight control, visual ops, Argo CD shop | Teams that want minimal intrusion, Flux/Deployment-first workflows |
Decision Framework
Do you actually need this, or is rolling update fine?
Ask the real questions:
- Is this service customer-facing or critical? Are you fine if a bad release affects hundreds/thousands of users?
- Can you afford “blast radius” risk during a deploy?
- Do you already have Prometheus (or metrics) and alerting to validate releases?
- Does your team have the operational muscle to interpret analysis failures and perform rollbacks?
If you’re running a small internal cron job, a simple rolling update is fine. If you’re deploying high-traffic frontends, APIs, payment or auth services — progressive delivery is worth the extra configuration. The rule of thumb: if a failed deploy costs more than a coffee or an apology email, add a progressive gate.
Which fits your team?
-
Use Argo Rollouts if:
- You are already on Argo CD or like the idea of replacing Deployments with a richer primitive.
- You want a UI for operators, visual rollout history, and integrated AnalysisTemplates.
- You prefer explicit rollout objects that own traffic routing.
-
Use Flagger if:
- You want to keep Deployments as-is (non-invasive).
- You’re using Flux or Deployment-first GitOps models.
- You need wide provider support and prefer metric templates attached to canaries rather than replacing a core resource type.
Other factors:
- Mesh-first teams (Istio/Linkerd): both work well.
- Ingress/ALB-heavy setups: Flagger historically had very strong AWS/ALB support; Argo Rollouts also supports ALB via providers — test both against your exact ingress controller.
Real commands (handy)
# Argo Rollouts CLI: inspect rolloutkubectl argo rollouts get rollout -n default canary-demo# Open the dashboardkubectl argo rollouts dashboard -n default
# Flagger: check canary statuskubectl -n test get canary podinfokubectl -n test describe canary podinfoConclusion (SumGuy voice — stop guessing, pick one)
If you’re running a fleet of critical services and want a polished cockpit with step-by-step controls, Argo Rollouts is the nicer driving experience. Swap in the Rollout CRD, hook up AnalysisTemplates, point it at Istio or Gateway API, and the dashboard makes you feel like a capable airline pilot. It’s less like a forklift and more like a precision crane.
If you like your Deployments and want a “set-it-and-forget-it” safety rail that watches your existing objects, Flagger is the faithful mechanic riding shotgun. It’s less intrusive, works with Flux, and will quietly nudge traffic back when metrics go sideways.
Pick Argo Rollouts if your team values UI, tight control, and Argo CD integration. Pick Flagger if you want minimal churn, Deployment-first workflows, and a controller that tucks neatly into an existing mesh or ingress setup.
Either way: add good Prometheus queries, test your analysis in a staging environment, and don’t let the fancy automation lull you into skipping canary observability. Progressive delivery is a seatbelt, not a guarantee — but it’s a damn good seatbelt for production.
Now go pick a forklift, back slowly, and keep the lights on.