Skip to content
Go back

Postgres HA: Patroni + etcd + HAProxy

By SumGuy 12 min read
Postgres HA: Patroni + etcd + HAProxy

Streaming Replication Won’t Save You at 2 AM

Here’s the thing about vanilla Postgres streaming replication: it’s great until it isn’t. You’ve got a primary, two standbys, data flowing in real time — and then the primary dies. Now what? You’re SSH-ing into a standby at 2 AM, running pg_promote, manually updating your app’s connection string, and praying you didn’t just promote a lagging replica with 30 seconds of missing transactions.

That’s the gap Patroni fills. It provides leader election using etcd as a Distributed Configuration Store (DCS), automatic promotion of the best replica, and a REST API that HAProxy uses for health checks so it knows exactly where to route traffic. No manual intervention. No 2 AM heroics.

This is a full end-to-end walkthrough. Real version numbers, real commands, real trade-offs.


Architecture Overview

Three layers, six VMs (or containers, or LXC — pick your poison):

┌─────────────────────────────────────────┐
│ HAProxy (1 node) │
│ :5000 → primary only (read/write) │
│ :5001 → replicas only (read-only) │
└────────────┬───────────────┬────────────┘
│ │
┌────────▼──────┐ ┌──────▼────────┐
│ pg-node-1 │ │ pg-node-2 │ ... pg-node-3
│ Patroni 4.x │ │ Patroni 4.x │
│ Postgres 17 │ │ Postgres 17 │
└───────┬───────┘ └───────┬───────┘
│ │
┌───────▼─────────────────▼───────┐
│ etcd cluster (3 nodes) │
│ etcd-1 / etcd-2 / etcd-3 │
└─────────────────────────────────┘

etcd gives you quorum-based leader election. Patroni holds a lease in etcd. If the primary can’t renew its lease (network partition, OOM kill, whatever), Patroni on a replica picks up the lease and promotes itself. HAProxy’s health check hits Patroni’s REST API — /master returns 200 on the current primary, /replica returns 200 on standbys. Clean, deterministic routing.

Node IPs for this guide:

HostIPRole
etcd-110.0.0.11etcd
etcd-210.0.0.12etcd
etcd-310.0.0.13etcd
pg-110.0.0.21Patroni + Postgres
pg-210.0.0.22Patroni + Postgres
pg-310.0.0.23Patroni + Postgres
haproxy10.0.0.30HAProxy

Step 1: etcd 3.5 Cluster

Install on all three etcd nodes:

Terminal window
ETCD_VER=v3.5.14
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
| tar xz -C /usr/local/bin --strip-components=1 etcd-${ETCD_VER}-linux-amd64/etcd \
etcd-${ETCD_VER}-linux-amd64/etcdctl

Create the data dir and systemd unit on each node. Replace etcd-1, 10.0.0.11, and the --initial-cluster values per host:

Terminal window
mkdir -p /var/lib/etcd
# /etc/systemd/system/etcd.service — on etcd-1
[Unit]
Description=etcd
After=network.target
[Service]
Type=notify
User=root
ExecStart=/usr/local/bin/etcd \
--name etcd-1 \
--data-dir /var/lib/etcd \
--listen-peer-urls http://10.0.0.11:2380 \
--listen-client-urls http://10.0.0.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.0.11:2379 \
--initial-advertise-peer-urls http://10.0.0.11:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcd-1=http://10.0.0.11:2380,etcd-2=http://10.0.0.12:2380,etcd-3=http://10.0.0.13:2380 \
--initial-cluster-state new
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target

On etcd-2: same but --name etcd-2, --listen-peer-urls http://10.0.0.12:2380, etc. On etcd-3: same pattern with 10.0.0.13.

Terminal window
systemctl daemon-reload
systemctl enable --now etcd

Verify all three nodes see each other:

Terminal window
etcdctl --endpoints=http://10.0.0.11:2379,http://10.0.0.12:2379,http://10.0.0.13:2379 endpoint health

You want three lines all saying is healthy. If you get quorum errors, check firewall rules on 2379/2380.


Step 2: Postgres 17 + Patroni 4.x

On all three Postgres nodes:

Terminal window
# Postgres 17 from PGDG
apt install -y curl ca-certificates
install -d /usr/share/postgresql-common/pgdg
curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
https://www.postgresql.org/media/keys/ACCC4CF8.asc
echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" \
> /etc/apt/sources.list.d/pgdg.list
apt update && apt install -y postgresql-17
# Stop and disable the default service — Patroni manages the lifecycle
systemctl stop postgresql
systemctl disable postgresql
# Patroni 4.x
apt install -y python3-pip python3-psycopg2
pip3 install patroni[etcd3] --break-system-packages

Patroni needs Python’s etcd3 extras. The [etcd3] install target pulls in python-etcd3 and grpcio for the gRPC-based etcd v3 API. If you’re on a distro that screams about --break-system-packages, use a venv — python3 -m venv /opt/patroni && /opt/patroni/bin/pip install patroni[etcd3].


Step 3: Patroni Configuration

The patroni.yml below goes on each node. Only name, connect_address, and listen change per node.

# /etc/patroni/patroni.yml — on pg-1
scope: postgres-ha
namespace: /service/
name: pg-1
restapi:
listen: 10.0.0.21:8008
connect_address: 10.0.0.21:8008
etcd3:
hosts:
- 10.0.0.11:2379
- 10.0.0.12:2379
- 10.0.0.13:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576 # 1 MB — don't promote a badly lagging replica
synchronous_mode: on
postgresql:
use_pg_rewind: true
parameters:
max_connections: 200
shared_buffers: 512MB
wal_level: replica
max_wal_senders: 10
max_replication_slots: 10
hot_standby: on
synchronous_commit: on
wal_log_hints: on # required for pg_rewind
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 10.0.0.0/24 scram-sha-256
- host all all 10.0.0.0/24 scram-sha-256
users:
admin:
password: "changeme_admin"
options:
- createrole
- createdb
replicator:
password: "changeme_repl"
options:
- replication
postgresql:
listen: 10.0.0.21:5432
connect_address: 10.0.0.21:5432
data_dir: /var/lib/postgresql/17/main
bin_dir: /usr/lib/postgresql/17/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: "changeme_repl"
superuser:
username: postgres
password: "changeme_super"
rewind:
username: rewind_user
password: "changeme_rewind"
parameters:
archive_mode: on
archive_command: >-
pgbackrest --stanza=main archive-push %p
watchdog:
mode: required
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonedfrom: false
nosync: false

On pg-2 and pg-3, change name: pg-2 / pg-3, and both listen/connect_address IP values.

The watchdog block is important. With mode: required, Patroni will refuse to start if it can’t open /dev/watchdog. That’s intentional — a hung Postgres node that can’t communicate should fence itself rather than let HAProxy route to a split-brain primary. Load the kernel module: modprobe softdog && echo 'softdog' >> /etc/modules.

Create the systemd service for Patroni:

/etc/systemd/system/patroni.service
[Unit]
Description=Patroni Cluster Manager
After=network.target
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
Terminal window
mkdir -p /etc/patroni
chown postgres:postgres /etc/patroni
chmod 700 /etc/patroni/patroni.yml # contains passwords
systemctl daemon-reload
systemctl enable --now patroni

Start pg-1 first. It will initialize the cluster and bootstrap. Then start pg-2 and pg-3 — they’ll clone from pg-1 automatically.

Check cluster state:

Terminal window
patronictl -c /etc/patroni/patroni.yml list

Expected output:

+ Cluster: postgres-ha (7123456789012345678) +---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+-------------+---------+---------+----+-----------+
| pg-1 | 10.0.0.21:5432 | Leader | running | 1 | |
| pg-2 | 10.0.0.22:5432 | Replica | running | 1 | 0 |
| pg-3 | 10.0.0.23:5432 | Replica | running | 1 | 0 |
+--------+-------------+---------+---------+----+-----------+

Step 4: HAProxy 2.9

On the haproxy node:

Terminal window
apt install -y haproxy=2.9.*

The HAProxy config uses Patroni’s REST API for health checks. /primary returns HTTP 200 only on the current primary. /replica returns 200 only on standbys. HAProxy routes accordingly — no manual intervention, no custom scripts.

/etc/haproxy/haproxy.cfg
global
maxconn 100
log /dev/log local0
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
#---------------------------------------------------------------------
# Read/Write — primary only
#---------------------------------------------------------------------
listen postgres_rw
bind *:5000
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg-1 10.0.0.21:5432 maxconn 100 check port 8008 check-ssl verify none
server pg-2 10.0.0.22:5432 maxconn 100 check port 8008 check-ssl verify none
server pg-3 10.0.0.23:5432 maxconn 100 check port 8008 check-ssl verify none
#---------------------------------------------------------------------
# Read-Only — replicas only
#---------------------------------------------------------------------
listen postgres_ro
bind *:5001
option httpchk GET /replica
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg-1 10.0.0.21:5432 maxconn 100 check port 8008 check-ssl verify none
server pg-2 10.0.0.22:5432 maxconn 100 check port 8008 check-ssl verify none
server pg-3 10.0.0.23:5432 maxconn 100 check port 8008 check-ssl verify none
#---------------------------------------------------------------------
# Stats page
#---------------------------------------------------------------------
listen stats
bind *:7000
mode http
stats enable
stats uri /
stats refresh 10s
stats show-node

The option httpchk without a path defaults to GET / — override the path for the read/write listener to hit /primary explicitly. HAProxy 2.9 sends to port 8008 but we need the path. Set it per-listener:

Terminal window
# Add this line to the postgres_rw listen block:
# option httpchk GET /primary

Updated rw block:

Updated: As of Patroni 4.0, the /master endpoint was removed — use /primary for the primary and /replica for standbys.

listen postgres_rw
bind *:5000
option httpchk GET /primary
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg-1 10.0.0.21:5432 maxconn 100 check port 8008
server pg-2 10.0.0.22:5432 maxconn 100 check port 8008
server pg-3 10.0.0.23:5432 maxconn 100 check port 8008
Terminal window
systemctl enable --now haproxy

Test connectivity:

Terminal window
psql -h 10.0.0.30 -p 5000 -U admin -d postgres -c "SELECT pg_is_in_recovery();"
# Returns: f (false) — you're on the primary
psql -h 10.0.0.30 -p 5001 -U admin -d postgres -c "SELECT pg_is_in_recovery();"
# Returns: t (true) — you're on a replica

Step 5: The Trade-Off You’re Signing Up For

Honestly, this is the part most guides skip over. With synchronous_mode: on and synchronous_commit: on, the primary won’t acknowledge a write until at least one synchronous standby has written it to its WAL. Zero data loss — but if your synchronous standbys are both down or partitioned, the primary blocks writes. It won’t just degrade gracefully; it stops.

That’s the deal. You pick one:

For a homelab database, local is probably fine. For anything financial, use on and accept the stall risk. Patroni’s synchronous_node_count parameter lets you tune how many sync standbys are required — default is 1.


Step 6: Failover Test

This is the fun part. Kill the primary hard:

Terminal window
# On pg-1 (the current primary)
kill -9 $(head -1 /var/lib/postgresql/17/main/postmaster.pid)

Watch what happens on any other node:

Terminal window
watch -n 1 patronictl -c /etc/patroni/patroni.yml list

Within ttl seconds (30 in our config), you’ll see pg-2 or pg-3 acquire the leader lease and promote:

+ Cluster: postgres-ha (7123456789012345678) +---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+-------------+---------+---------+----+-----------+
| pg-1 | 10.0.0.21:5432 | Replica | stopped | | unknown |
| pg-2 | 10.0.0.22:5432 | Leader | running | 2 | |
| pg-3 | 10.0.0.23:5432 | Replica | running | 2 | 0 |
+--------+-------------+---------+---------+----+-----------+

HAProxy’s health check picks this up within the inter 3s interval. Port 5000 now routes to pg-2. Port 5001 routes to pg-3 (and eventually pg-1 once it rejoins).

When pg-1 comes back, Patroni uses pg_rewind to reconcile its WAL with the new primary’s timeline, then rejoins as a replica. No manual steps.

Terminal window
# Manually trigger failover without killing anything (useful for maintenance):
patronictl -c /etc/patroni/patroni.yml failover postgres-ha --master pg-1 --candidate pg-2 --force

Step 7: pgBackRest Integration

Patroni manages the cluster; pgBackRest handles backups. They play nicely together — configure archive_command in Patroni’s postgresql parameters block (as shown in patroni.yml above) so WAL archiving works on whichever node is currently the primary.

/etc/pgbackrest/pgbackrest.conf
# On the node that will run backups (or a dedicated backup host)
apt install -y pgbackrest
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
log-level-console=info
log-level-file=detail
[main]
pg1-path=/var/lib/postgresql/17/main
pg1-port=5432
pg1-user=postgres
Terminal window
# Initialize the stanza (run once)
pgbackrest --stanza=main stanza-create
# Full backup
pgbackrest --stanza=main backup --type=full
# Verify
pgbackrest --stanza=main info

The archive_command in patroni.yml calls pgBackRest for each WAL segment. Combined with a nightly full backup and continuous WAL archiving, you’ve got point-in-time recovery on top of your HA cluster.


Gotchas Worth Knowing Before You Start

Clock skew. etcd’s lease TTL is wall-clock time. If your nodes have drifted clocks, Patroni’s heartbeat math gets weird and you’ll see spurious failovers. Install chrony on every node:

Terminal window
apt install -y chrony
systemctl enable --now chrony
chronyc tracking # verify offset < 1s

Watchdog timeout vs. TTL. Patroni’s ttl (30s) should be at least twice the loop_wait (10s). The watchdog safety_margin (5s) is subtracted from the watchdog kernel timeout — make sure your watchdog device timeout is greater than ttl + safety_margin. For softdog the default is 60s, which works fine.

etcd quorum loss. If two of your three etcd nodes go down, etcd goes into read-only mode. Patroni can’t renew leases, can’t elect a new leader, and your cluster freezes in its current state. The primary keeps serving existing connections, but no failover can happen. Three etcd nodes tolerate one failure; five nodes tolerate two. Plan accordingly.

pg_rewind and wal_log_hints. Without wal_log_hints: on in postgresql.parameters, pg_rewind won’t work and rejoining a demoted primary requires a full re-clone. Enable it now, not after your first messy failover.

maximum_lag_on_failover. The 1 MB setting means Patroni won’t promote a replica that’s more than 1 MB behind the primary’s WAL. That’s usually fine on a LAN, but if you have a heavily loaded primary and slow replicas, tune this up or you’ll find no eligible candidate for promotion.


Should You Bother?

Honestly? For a homelab personal project where 10 minutes of downtime is fine — probably not. This setup has real operational weight: seven nodes minimum, etcd to babysit, TLS certificates if you care about security, and watchdog kernel modules. It’s not “install and forget.”

But for anything that actually matters — a side project people depend on, a small business app, a home automation database that controls your HVAC — Patroni + etcd + HAProxy is the right answer. It’s what production teams at scale use, and for good reason. Automatic failover in under 30 seconds, zero data loss with synchronous mode, clean read/write splitting, and enough observability (REST API, patronictl, HAProxy stats) to know what’s happening without logging into every node.

The 2 AM difference between “Postgres is down, paging on-call” and “Postgres failed over automatically, I’ll review the logs in the morning” is worth the setup cost.

Start with the etcd cluster, validate it’s healthy, then add Patroni one node at a time. Kill things deliberately. Build muscle memory for what failover looks like before production traffic depends on it.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
Boundary vs Teleport

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts