Why do distributed systems need consensus algorithms like Raft or Paxos?

Independent nodes that can crash or be partitioned still need to agree on one value — who the leader is, what the next log entry is, whether a transaction committed. Raft and Paxos let a majority quorum agree safely even when a minority of nodes fail, which underpins leader election, replicated logs, and strongly consistent stores.

What is the difference between two-phase commit and a saga?

Two-phase commit (2PC) coordinates an all-or-nothing transaction across services: a coordinator asks everyone to prepare, then commits only if all vote yes — atomic but blocking if the coordinator fails. A saga splits the work into local transactions, each with a compensating action to undo it; it is non-blocking and eventually consistent but you must write the rollbacks yourself.

What is idempotency and why does it matter?

An idempotent operation has the same effect whether applied once or many times. Because networks retry, the same request can arrive twice; idempotency — usually via a unique idempotency key the server deduplicates on — ensures a retried payment or message isn't processed twice, giving exactly-once effects on top of at-least-once delivery.

Why does consensus need a majority (quorum)?

Requiring a strict majority guarantees that any two quorums overlap on at least one node, so two conflicting decisions (such as two leaders) can never both gain a majority — even during a network partition. This quorum intersection is what makes consensus safe despite failures.

System Design Fundamentals / Consensus & Coordination

Fundamental 04~14 min readAdvanced

Deep Dive

Nodes crash.
The cluster still agrees.

Machines die, messages vanish, clocks drift — and yet a distributed system must still settle on exactly one answer: one leader, one next log entry, one commit decision. That's the consensus problem, the hardest in the field, and the algorithms that solve it underpin almost everything reliable.

Why agreement is hard

Consensus means getting a group of nodes to agree on a single value, even when some fail. It sounds trivial until you add reality: messages arrive late or never, nodes crash and recover, and you can't tell a slow node from a dead one. The FLP impossibility result proves that in a fully asynchronous network, no algorithm can guarantee consensus if even one node may fail — so real systems use timeouts and randomization to make progress in practice while staying safe always.

Many everyday problems are consensus in disguise: electing a leader, agreeing the next entry in a replicated log, deciding whether a distributed transaction committed, or maintaining a distributed lock. Solve consensus once and you get all of them.

Quorums: agreement by majority

The core trick is the majority quorum. If every decision needs votes from a strict majority, then any two decisions must share at least one node — because two majorities of the same set always overlap. That single overlapping node can't vote for two conflicting values, so two leaders or two conflicting commits can never both succeed, even during a partition.

3 of 5 is a majority — and any two majorities overlap

✓

A 5-node cluster tolerates 2 failures: 3 survivors still form a majority and keep deciding. This is why clusters are sized at odd numbers.

It's all counting. A majority is ⌊N/2⌋+1, a cluster survives N−majority failures, and because two majorities sum to more than N they must share a node — so at most one leader per term:

majority(5)=3, tolerates 2 · majority(4)=3, tolerates 1 → 4 nodes = same tolerance as 3: size odd

And 3+3 > 5, so two candidates can't both reach a majority — Raft's single-leader guarantee, from pure arithmetic. The runnable version below runs an election and the fault-tolerance table.

Raft & Paxos

The two famous consensus algorithms. Same guarantee, different ergonomics.

Raft was designed to be understandable. It elects a single leader for a term; the leader takes all writes, appends them to a replicated log, and an entry is committed once a majority has stored it. If the leader goes silent, followers time out and a new election starts. It powers etcd, Consul, and CockroachDB.

Raft — elect, then replicate

Follower
times out

→

Candidate
requests votes

→

Majority?
→ Leader

→

Replicate log
to followers

A higher term always wins, so a recovered old leader steps down — no split brain.

Paxos reaches the same safety with proposers, acceptors, and learners, but is famously hard to reason about and implement; Multi-Paxos adds a stable leader for efficiency. In interviews you rarely need the internals — what matters is knowing that a majority-quorum algorithm gives you safe agreement, and reaching for an existing implementation rather than writing your own.

Coordination services

You almost never implement consensus yourself. Instead you lean on a coordination service — ZooKeeper, etcd, or a cloud equivalent — which runs the consensus protocol internally and exposes simple primitives: leader election, distributed locks, configuration storage, and membership/failure detection. The rule of thumb: keep the coordination service for the small, critical state that must be globally agreed, and keep your bulk data out of it.

→ Mental model

The senior move is knowing where you genuinely need consensus (one leader, one source of truth) versus where you can avoid it entirely with idempotency and eventual consistency. Consensus is correct but expensive — every decision costs a majority round-trip.

Distributed transactions: 2PC vs saga

When one logical operation spans multiple services or shards, you need them to agree on commit-or-abort.

Two-Phase Commit (2PC)

Coordinator: "prepare?" → all vote
All yes → "commit"; any no → "abort"
Strong atomicity across services
Blocks if coordinator dies mid-flight
Holds locks for the whole round-trip

Saga

A sequence of local transactions
Each step has a compensating undo
Failure → run compensations backwards
Non-blocking, eventually consistent
You design the rollbacks

2PC buys you atomicity at the cost of availability — a stalled coordinator leaves participants holding locks. Sagas trade atomicity for availability: each step commits independently and a failure triggers compensating actions (refund the charge, release the seat). Most modern microservice systems prefer sagas precisely because they don't block.

Idempotency: the exactly-once illusion

Networks deliver at least once — retries mean the same request can arrive twice. True "exactly-once delivery" is impossible, but you can get exactly-once effects by making operations idempotent: applying them twice has the same result as applying them once.

The standard mechanism is an idempotency key — a unique ID the client attaches to a request. The server records processed keys and, on a duplicate, returns the original result instead of doing the work again. This is how a retried payment charges the card once, and how a redelivered queue message isn't processed twice.

Idempotency key deduplicates the retry

Pay req
key=ab12

→

Charged ✓
store ab12

→

Retry
key=ab12

→

Seen ab12 →
return original

→ Key insight

Idempotency is how you avoid needing consensus. If every operation is safe to retry, much of a system can run on cheap at-least-once delivery and eventual consistency — reserving expensive agreement for the few places that truly need one source of truth. Pair it with the fencing tokens from concurrency control to stay correct under retries and pauses.

RUN IT YOURSELF

Count a quorum and run an election

Consensus is majority arithmetic. This computes the fault-tolerance table — and shows why clusters are odd-sized, because a 4-node cluster survives no more failures than a 3-node one, just wasting a vote. Then it runs a Raft leader election: a 3-2 split elects the candidate with the majority, but a 2-2 split (one node crashed) elects no one, forcing a new term. And because two majorities of five sum to six, two candidates can never both win. Change the cluster size or the vote split.

CPython · WebAssembly

See consensus & coordination in real designs

Message Queue — ordering & replication Job Scheduler — leader election Key-Value Store — quorum writes Payment System — 2PC / saga Notifications — idempotent delivery

Frequently asked

Quick answers

Why do we need Raft or Paxos?

Nodes that can crash or be partitioned still need to agree on one value — the leader, the next log entry, a commit decision. These algorithms let a majority quorum agree safely even when a minority fails.

2PC vs saga?

2PC is an atomic prepare-then-commit across services, but blocks if the coordinator fails. A saga is a chain of local transactions with compensating undos — non-blocking and eventually consistent, but you write the rollbacks.

What is idempotency?

An operation that has the same effect applied once or many times. Using an idempotency key the server deduplicates on, retries give exactly-once effects on top of at-least-once delivery.

Why does consensus need a majority?

Any two majorities of the same set overlap on at least one node, so two conflicting decisions can't both win — even during a partition. That quorum intersection is what keeps consensus safe.

Finished this one? 0 / 177 Handbooks done

Explore the topic

See this alongside everything else on the same subject — handbooks, system designs, challenges and tools, in one place.

Distributed Systems

Nodes crash.
The cluster still agrees.

Why agreement is hard

Quorums: agreement by majority

Raft & Paxos

Coordination services

Distributed transactions: 2PC vs saga

Two-Phase Commit (2PC)

Saga

Idempotency: the exactly-once illusion

Count a quorum and run an election

Quick answers

Why do we need Raft or Paxos?

2PC vs saga?

What is idempotency?

Why does consensus need a majority?

Explore the topic

More Handbooks

Explore more from Vibe Engines

Nodes crash.The cluster still agrees.

Why agreement is hard

Quorums: agreement by majority

Raft & Paxos

Coordination services

Distributed transactions: 2PC vs saga

Two-Phase Commit (2PC)

Saga

Idempotency: the exactly-once illusion

Count a quorum and run an election

Quick answers

Why do we need Raft or Paxos?

2PC vs saga?

What is idempotency?

Why does consensus need a majority?

The other fundamentals

System Design Fundamentals →

CAP & Consistency →

Partitioning & Replication →

Explore the topic

More Handbooks

Explore more from Vibe Engines

Get the next one in your inbox.

Nodes crash.
The cluster still agrees.