You Need a Source of Truth
Your cluster needs to agree on something. Who’s the leader? Where’s the database? What’s the current config? Is this service healthy?
Pick the wrong answer, and your 2 AM self will be staring at a split-brain incident where half your cluster thinks node A is the leader and the other half thinks it’s node B. Fun times.
Three systems have spent the last decade solving this exact problem. Let’s see which one is right for you.
The Lay of the Land
Before we get into the weeds: all three are distributed key-value stores designed to solve consensus and coordination problems. They all use a form of write-ahead logging to survive failures. They all require an odd number of nodes (usually 3 or 5) to avoid ties. And they all store data that must be consistent across the cluster.
That’s where the similarities end.
etcd is the minimalist. Raft-based, HTTP and gRPC APIs, powers Kubernetes, made by CoreOS (now Red Hat). It does one thing obsessively well: be a reliable, fast key-value store.
Consul is the Swiss army knife. Built by HashiCorp, it’s a service mesh + service discovery engine + key-value store all bolted together. It uses Serf for gossip and Raft for consensus. If you need service discovery, DNS, health checks, and coordination all from one binary, Consul is your answer.
ZooKeeper is the grizzled veteran. Born at Yahoo to coordinate Hadoop jobs, it powers Kafka, Storm, and HBase. It uses its own Zab (ZooKeeper Atomic Broadcast) protocol. If your data pipeline lives in the Hadoop ecosystem, ZooKeeper is already there. If you’re building new infrastructure, honestly? You probably don’t want this.
Consistency: Raft vs Zab vs “Who Cares, My Config Is Static”
Let’s talk about how these systems keep data consistent across the cluster.
etcd uses Raft. Raft is elegant. A leader is elected, writes go to the leader, the leader replicates to followers, and you get strong consistency. If you read from etcd, you’re getting the latest committed state. Your 2 AM self appreciates this.
Consul also uses Raft for its key-value layer. Gossip (Serf) handles cluster membership and node discovery; Raft handles the authoritative data store. This gives you the same strong consistency as etcd, just with more complexity bolted on.
ZooKeeper uses Zab. It’s similar to Raft in spirit—a leader broadcasts writes to followers—but the details differ. Both guarantee consistency, so for practical purposes: you won’t feel the difference in a well-tuned cluster. You will feel the difference in the operational headaches. ZK’s leader election can be slower, and the protocol is harder to reason about if something goes wrong at 2 AM.
The real difference: Raft is simpler. ZooKeeper’s Zab is older and more battle-tested in large Kafka deployments, but Raft has become the de facto standard. When you’re hiring, Raft knowledge is more common.
APIs: Pick Your Flavor
etcd gives you HTTP/2 with gRPC. Modern, clean, language-agnostic. Put a key, get a key, watch for changes:
# Set a keyetcdctl put /myapp/config '{"debug": true}'
# Get it backetcdctl get /myapp/config
# Watch for changes (blocks until something changes)etcdctl watch /myapp/config
# List all keys under a prefixetcdctl get /myapp --prefixThe watch is huge. You can tell etcd “notify me when this config changes” and it’ll push the update to you. No polling, no waste.
Consul gives you HTTP + DNS. You can interact with the KV store using RESTful APIs:
# Set a valuecurl -X PUT -d 'value' http://localhost:8500/v1/kv/myapp/config
# Get itcurl http://localhost:8500/v1/kv/myapp/config
# Delete itcurl -X DELETE http://localhost:8500/v1/kv/myapp/config
# Query services by DNSdig web.service.consulConsul’s party trick is the DNS interface. Your app doesn’t need a special client library—it just does a DNS lookup. That’s powerful if you’re retrofitting service discovery into existing infrastructure. But it means you’re limited to what DNS can express, which is why Consul also exposes the HTTP API for richer operations.
ZooKeeper uses a binary protocol. You need a client library (zkCli.sh for humans, usually). It’s less ergonomic:
# ConnectzkCli.sh -server localhost:2181
# Inside the client:create /myapp/config "value"get /myapp/configls /myappdelete /myapp/configZK feels old because it is old. The binary protocol is efficient, but it means tooling is less straightforward. Want to inspect ZK state from your laptop? You need a Java client or zkCli. Want to do it with etcd? curl works fine.
Feature Depth: Minimalist vs Swiss Army Knife vs Kitchen Sink
etcd is minimal.
- Key-value store: yes
- Transactions: yes (conditional updates)
- Leases: yes (keys that expire after a TTL)
- Leader election: you implement it yourself with leases
- Service discovery: no built-in
- Health checks: not its job
- DNS: not its job
etcd gives you the primitives, and you build the rest. This is by design. It’s Kubernetes’ backbone because Kubernetes wants control over how it uses the data.
Consul is feature-complete.
- Key-value store: yes
- Service discovery: yes (built-in)
- Health checks: yes (HTTP, TCP, script-based)
- DNS: yes (queries like
web.service.consul) - Service mesh (Consul Connect): yes
- Multi-datacenter: yes (replication across DCs)
Consul is like hiring a forklift to move a couch. Technically it works, and if you’re moving a lot of couches (running a large service mesh), you need it. But if you just have one couch, you’re spending engineering effort on features you’ll never use.
ZooKeeper is a heavy toolkit.
- Key-value store: yes
- Watches: yes (notified when data changes)
- Ephemeral nodes: yes (like leases)
- ACLs: yes
- Service discovery: no (Kafka uses ZK to store broker metadata, not for general discovery)
- Health checks: no
- DNS: no
ZooKeeper gives you enough to coordinate Kafka or Hadoop. It doesn’t pretend to be a service mesh. You appreciate this if you’re in the Hadoop ecosystem; you resent it if you’re trying to use ZK for something it wasn’t built for.
Running Three Nodes: The Operational Reality
Let’s talk about what happens when you actually need to run these clusters.
etcd: Straightforward, Fast
A 3-node etcd cluster is dead simple. You run three etcd instances pointing at each other:
ETCD_NAME=node1ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379ETCD_ADVERTISE_CLIENT_URLS=http://node1:2379ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380ETCD_INITIAL_ADVERTISE_PEER_URLS=http://node1:2380ETCD_INITIAL_CLUSTER=node1=http://node1:2380,node2=http://node2:2380,node3=http://node3:2380ETCD_INITIAL_CLUSTER_STATE=newRepeat for node2 and node3 (different IPs), start the daemons, and you’re done. Leader election happens automatically. Writes are fast (Raft pipelining). Failures are handled gracefully.
Snapshots are automatic. Backup is: copy the data directory.
Consul: More Operators, More Features
A 3-node Consul cluster is similarly sized, but you’re managing more moving parts:
# Start server 1 (bootstrap)consul agent -server -ui \ -bootstrap-expect=3 \ -node=consul1 \ -bind=10.0.0.1
# Start server 2consul agent -server \ -node=consul2 \ -bind=10.0.0.2 \ -join=10.0.0.1
# Start server 3consul agent -server \ -node=consul3 \ -bind=10.0.0.3 \ -join=10.0.0.1Consul wants you to think about gossip rings, datacenter replication, and ACLs. It’s more powerful, but you’re paying for it in operational overhead. A 3-node cluster works fine, but you’re managing a more complex system.
ZooKeeper: Heavyweight, Requires Java
A 3-node ZooKeeper ensemble requires a config file and Java:
tickTime=2000dataDir=/var/lib/zookeeperclientPort=2181
server.1=zk1:2888:3888server.2=zk2:2888:3888server.3=zk3:2888:3888Plus a myid file on each server (1, 2, or 3). Start the daemons and wait for leader election (can be slow). ZooKeeper is solid once running, but the setup is more finicky, and you’re maintaining Java processes.
Snapshots are automatic. Backups require careful handling because ZK’s transaction logs can get large.
Leader Election and Distributed Locks
All three systems can be used to elect a leader and coordinate locks. Here’s the pattern:
etcd: Leases + Compare-and-Swap
# Create a lease (30-second TTL)LEASE=$(etcdctl lease grant 30 | awk '{print $2}')
# Try to acquire leadership by putting with the lease# (this succeeds only if the key doesn't exist)etcdctl put --lease=$LEASE /cluster/leader $HOSTNAME
# Your app watches the leader keyetcdctl watch /cluster/leader
# If you die, the lease expires and the key is gone# Competitors race to claim itClean, efficient, built for this use case.
Consul: Prepared Queries + Sessions
# Create a session (auto-revoked if you don't heartbeat)SESSION=$(curl -X PUT http://localhost:8500/v1/session/create | jq -r '.ID')
# Acquire the lock/leader keycurl -X PUT -d "$HOSTNAME" \ "http://localhost:8500/v1/kv/cluster/leader?acquire=$SESSION"
# If successful, you're the leader# If you don't heartbeat the session, Consul revokes it and the key is releasedPowerful, but requires careful session management.
ZooKeeper: Ephemeral Sequential Nodes
# Create an ephemeral sequential nodecreate -e -s /cluster/leader $HOSTNAME# Result: /cluster/leader0000000001
# Watch the next node in sequence# If it's deleted, you're the leader# If your connection dies, your node is auto-deletedThe ephemeral node pattern is elegant. Your node stays alive as long as your connection is open; if you crash, it’s gone. Competitors race to claim the next lowest number.
Snapshots and Disaster Recovery
It’s 2 AM. Your cluster is split. Two nodes are up, one is down. You lose the two nodes. Now you only have one node left, and it doesn’t have quorum. Your 2 AM self is not happy.
etcd disaster recovery: Copy the remaining node’s data directory to two new nodes. Set ETCD_INITIAL_CLUSTER_STATE=existing and point them at each other. etcd will restore from the snapshot. This works because etcd’s snapshots are point-in-time, and the write-ahead log (wal/) can replay recent writes.
Consul disaster recovery: Similar to etcd, but more complex because you need to restore both the KV store and service metadata. Consul has a backup/restore API, but it’s more involved than etcd.
ZooKeeper disaster recovery: ZK keeps a transaction log and snapshots. Recovery is similar, but you need to be careful about the myid files and leader election. ZK’s recovery can be slower because the leader election process is more involved.
The pragmatic take: all three are recoverable. etcd’s recovery is the simplest. ZooKeeper’s is the most error-prone if you’re not familiar with the internals.
Consul’s Killer Feature: Service Discovery
Here’s where Consul separates itself.
You have a web service running on three nodes. Consul automatically registers each instance with health checks. Your app client queries Consul (via DNS or HTTP) to get a list of healthy instances. No load balancer, no manual configuration. Consul handles the discovery.
# Register a servicecurl -X PUT -d '{ "ID": "web-1", "Name": "web", "Port": 8080, "Check": { "HTTP": "http://localhost:8080/health", "Interval": "10s" }}' http://localhost:8500/v1/agent/service/register
# Query for healthy instancescurl http://localhost:8500/v1/catalog/service/web
# Or just use DNSdig web.service.consulThis is powerful. Your infrastructure becomes self-describing. New instances register themselves, unhealthy ones are removed automatically, and clients find them via DNS.
etcd doesn’t do this. ZooKeeper doesn’t either (though Kafka uses ZK to store broker metadata, which is similar). If you need service discovery, Consul is the play.
etcd’s Killer Feature: Being Kubernetes
Kubernetes uses etcd for everything. Pods, services, config maps, secrets, nodes, controllers’ state. Kubernetes owns the semantics; etcd just stores and replicates.
If you’re running Kubernetes, etcd is non-negotiable. And honestly? Once you’re in the Kubernetes ecosystem, etcd becomes your go-to for app coordination too. Your config store, your leader election, your distributed locks—etcd.
This is why etcd has become the de facto standard for new infrastructure. It’s simple, fast, and the whole cloud-native world is built on it.
ZooKeeper’s Killer Feature: Kafka, Still
ZooKeeper is essential if you’re running Kafka at scale. Kafka uses ZK to store broker metadata, partition assignments, and controller election.
The catch: Kafka is moving away from ZooKeeper. KRaft (Kafka Raft) is the new consensus layer, and it’s production-ready as of Kafka 3.3. But if you’re running older Kafka deployments, ZooKeeper is your reality. And honestly, it works fine. It’s boring, which is why it’s been the default for a decade.
If you’re building a new data pipeline today, push for KRaft. If you’re inheriting a ZK-based system, don’t panic—it’s a familiar, stable tool.
The Decision Matrix
Here’s the honest take:
| Use Case | Pick | Why |
|---|---|---|
| Kubernetes cluster state | etcd | Non-negotiable. K8s owns the design. |
| App leader election | etcd | Simpler than the alternatives. Leases are clean. |
| Service discovery + mesh | Consul | Built for this. DNS + HTTP APIs. |
| Distributed config store | etcd | Fast, simple, gRPC support. |
| Existing Kafka cluster | ZooKeeper | Already there. Running fine. Leave it alone. |
| New data pipeline coordination | etcd | Simpler than ZK. Easier to hire for. |
| Multi-datacenter service mesh | Consul | Replication, health checks, DNS. It’s designed for this. |
| You already know ZK | ZooKeeper | Don’t fight your experience. |
| Legacy Hadoop/Storm ecosystem | ZooKeeper | This is what it was built for. |
The Honest Take
etcd is the default for new work. It’s simple, fast, and the cloud-native world is built on it. Use it for your coordination problems.
Consul is the right answer if you need service discovery and you’re willing to operate more infrastructure. It’s more complex than etcd, but it buys you observability and health checks that are hard to replicate yourself.
ZooKeeper is the reliable old friend who shows up when you need him. If you’re in the Kafka/Hadoop ecosystem, ZK is the pragmatic choice. If you’re building new infrastructure and someone suggests ZooKeeper, ask them why. The answer is usually “because I know it,” which is fair, but not a reason to pick it over etcd.
Your 2 AM self will thank you for running whichever one you actually understand. Pick the simplest tool that solves your problem, run a solid 3-node cluster, and sleep well knowing your cluster has a source of truth.