System Design Fundamentals  /  Concurrency & Locking
Fundamental 02~13 min readIntermediate
Deep Dive

Two writers,
one truth. Locking keeps it honest.

The instant two operations touch the same data at the same time, correctness is on the line — lost updates, dirty reads, double-spends. Concurrency control is the set of tools that decide which interleavings are allowed. Here's the full toolkit, from row locks to fencing tokens.

01

The races you're preventing

A race condition is when the result depends on the timing of operations that should have been independent. The classic is the lost update: two transactions read a balance of 100, each add 10, each write 110 — one update vanishes. Concurrency control exists to make these impossible.

AnomalyWhat happens
Dirty readYou read data another transaction wrote but hasn't committed — and it may roll back.
Non-repeatable readYou read the same row twice in one transaction and get different values.
Phantom readYou re-run a range query and new rows have appeared.
Lost updateTwo read-modify-write cycles overlap; one overwrites the other.
Write skewTwo transactions read overlapping data, then write disjoint rows that together break an invariant.
02

Pessimistic vs optimistic locking

The strategy you choose comes down to one question: how often do operations actually collide?

Two philosophies of conflict

Pessimistic — "assume conflict"

  • Lock before reading/writing
  • SELECT … FOR UPDATE
  • Others block and wait their turn
  • No wasted work, but risks deadlocks
  • Best for hot rows, high contention

Optimistic — "assume no conflict"

  • Read a version number, do the work
  • Compare-and-swap at commit time
  • No blocking; retry if version changed
  • Wasted work only when conflict is real
  • Best for read-heavy, low contention

Pessimistic locks come in shared (read) and exclusive (write) flavours; the danger is the deadlock — two transactions each holding a lock the other needs. Databases detect cycles and abort a victim, but you avoid them by acquiring locks in a consistent order. Optimistic concurrency avoids deadlocks entirely by never holding locks — at the price of a retry path you must actually write.

→ Interview tip

"How do you stop two people booking the same seat?" Name the trade-off out loud: a pessimistic row lock is simplest at low scale; an optimistic version check or a short-lived distributed lock scales better but needs a retry. The wrong answer is to never mention the race.

03

Isolation levels — the anomaly dial

SQL isolation levels are a contract: each one promises to prevent certain anomalies, in exchange for less concurrency. Higher is safer and slower.

Isolation levelDirty readNon-repeatablePhantom
Read UncommittedPossiblePossiblePossible
Read CommittedPreventedPossiblePossible
Repeatable ReadPreventedPreventedPossible*
SerializablePreventedPreventedPrevented

* In the ANSI standard. PostgreSQL's Repeatable Read (snapshot isolation) actually prevents phantoms but allows write skew; true Serializable closes that too.

04

MVCC — readers and writers stop fighting

Multi-Version Concurrency Control is how modern databases (PostgreSQL, MySQL InnoDB, Oracle) avoid the read-write blocking that naïve locking causes. Instead of overwriting a row, a write creates a new version tagged with the transaction that made it. Each transaction reads the version that was valid as of its start — a consistent snapshot.

Snapshot isolation via versions
Txn A starts
(sees v1)
Txn B writes
v2
Txn A reads
still v1
A commits
no block
Readers never block writers and vice-versa; old versions are garbage-collected once no transaction needs them.

The result: readers don't block writers, writers don't block readers. The cost is version bloat (the VACUUM problem in Postgres) and the subtle write-skew anomaly that only true serializable isolation removes.

05

Distributed locks & fencing tokens

When the contended resource lives across services, you reach for a distributed lock (Redis, ZooKeeper, etcd). These are far trickier than in-process locks, because a holder can pause — a long GC, a network stall — past the lock's expiry, then wake up still believing it holds the lock while someone else has taken it.

The fix is a fencing token: a monotonically increasing number issued with each lock grant. The protected resource records the highest token it has seen and rejects any write carrying a lower one — so a stale holder's writes bounce, even if its lock looks valid.

Fencing token rejects the stale holder
Client 1
token 33
pauses (GC)
Client 2
token 34
Storage: 33 < 34
reject Client 1
→ Key insight

A lock with a timeout is a lease, not a guarantee. Without a fencing token, any distributed lock can be held by two clients at once. Idempotency (see consensus & coordination) is the other half of staying correct under retries.

Frequently asked

Quick answers

Optimistic vs pessimistic locking?

Pessimistic locks before touching data (best for high contention). Optimistic takes no lock, checks a version at commit, and retries on conflict (best for read-heavy, low-contention workloads).

What are the four SQL isolation levels?

Read Uncommitted, Read Committed, Repeatable Read, and Serializable — each preventing more anomalies (dirty reads → non-repeatable reads → phantoms) at the cost of concurrency.

What is MVCC?

Multi-Version Concurrency Control keeps multiple row versions so readers see a consistent snapshot without blocking writers. Each transaction reads the version valid as of its start.

Why are distributed locks dangerous?

A holder can pause past the lock's expiry and resume while another process holds it. A fencing token — a monotonic number checked by downstream storage — rejects the stale writer.