What is a hot partition and how do you fix it?

A hot partition is a shard receiving disproportionate traffic because the shard key has skewed access, such as a celebrity user. Fixes include choosing a higher-cardinality key, adding a random suffix to spread a hot key across sub-partitions, or caching the hot key separately.

System Design Fundamentals / Partitioning & Replication

Fundamental 03~13 min readIntermediate

Deep Dive

Split the data to scale.
Copy it to survive.

Q: What is consistent hashing?

Consistent hashing maps both keys and nodes onto a ring; a key is owned by the next node clockwise. Adding or removing a node only moves the keys between it and its neighbour, instead of remapping nearly everything as plain modulo hashing would. Virtual nodes spread each physical node across many ring positions for even load.

Q: What is replication lag?

Replication lag is the delay between a write committing on the leader and that write appearing on a follower. With asynchronous replication, a read from a lagging follower can be stale, which breaks read-your-writes unless you route the user's reads to the leader or to a follower known to be caught up.

One machine can't hold the internet, and one disk failure shouldn't erase your product. Partitioning answers "where does this key live?" and replication answers "how many copies survive a failure?" — two different questions you almost always answer together.

Two questions, not one

Partitioning (sharding) splits one dataset across many machines so each holds a slice — it scales capacity and throughput. Replication copies the same data onto several machines — it provides fault tolerance and read scaling. They're orthogonal, and production systems combine them: shard the data into N partitions, then keep R replicas of each partition.

	Partitioning (sharding)	Replication
Goal	Scale capacity & write throughput	Fault tolerance & read scaling
Each node holds	A different slice of the data	A full copy of its data
Lose a node →	Lose that slice (unless replicated)	Lose nothing; a copy survives
Key decision	The shard key	Topology & sync vs async

How to partition

The partitioning strategy decides which node owns a given key.

Strategy	How	Strength	Weakness
Range	Contiguous key ranges per node	Efficient range scans	Hotspots on sequential keys
Hash	hash(key) → node	Even spread	Range scans hit every node
Directory	A lookup service maps key → node	Flexible, easy rebalancing	The directory is a dependency

Plain hashing — node = hash(key) % N — has a fatal flaw: change N (add or lose a machine) and almost every key remaps, forcing a full reshuffle. That's what consistent hashing fixes.

Consistent hashing

Add a node and move a few keys — not all of them.

Map both keys and nodes onto a circular hash space (a ring). A key is owned by the first node found clockwise from its position. When a node joins, it takes over only the keys between it and its predecessor; everyone else is untouched. When a node leaves, only its keys move to the next node.

Keys land on the ring, owned by the next node clockwise

Node A

→

Node B

→

Node C

↺

Adding Node D between B and C moves only B→C's keys onto D — roughly 1/N of the data, not all of it.

One refinement matters in practice: virtual nodes. Each physical machine is placed at many points on the ring, so load is even and removing a node spreads its keys across all survivors rather than dumping them on one neighbour.

The savings are dramatic and measurable. The naive scheme hash(key) mod N remaps almost every key the instant N changes; consistent hashing moves only the new node's share:

add a node to a 4-cluster (5,000 keys): naive ~80% of keys move · consistent hashing ~24% (≈ 1/N)

That gap is the difference between a resharding storm and a quiet rebalance. The runnable version below counts the moved keys both ways.

Hot partitions & the shard key

Even spread depends entirely on the shard key. A bad key creates a hot partition — one shard drowning in traffic while others idle. A celebrity user on a "shard by user_id" scheme is the canonical example.

Bad shard key

Low cardinality (e.g. country)
Skewed access (celebrities)
Sequential (timestamps)
→ hotspots, uneven load

Good shard key

High cardinality
Even access distribution
Matches your query pattern
→ balanced, scalable

When a single key is unavoidably hot, the fixes are: add a random suffix to fan it across sub-partitions, cache it separately, or give it dedicated capacity. The shard key is the one decision that most often makes or breaks the design.

Replication topologies

Topology	Writes go to	Trade-off
Single-leader	One leader, streamed to followers	Simple, no write conflicts; leader is a bottleneck & SPOF
Multi-leader	Several leaders (e.g. per region)	Low-latency local writes; must resolve write conflicts
Leaderless	Any replica; client uses quorums	Highly available; app handles read repair & conflicts

Single-leader replication

Client write

→

Leader

→

Follower 1

→

Follower 2

Synchronous = durable but slower (wait for followers). Asynchronous = fast but the un-replicated tail is lost if the leader dies.

Asynchronous replication introduces replication lag: a follower can be behind the leader, so a read from it may be stale. That silently breaks read-your-writes — a user edits their profile, then sees the old value. The fix is to route a user's reads to the leader (or a caught-up follower) for a short window after they write.

Quorums tie it together

Leaderless systems make the consistency knob explicit. With N replicas, require W acknowledgements per write and R replicas per read. If W + R > N, the read and write sets overlap on at least one current replica — strong consistency. Drop below that and you trade freshness for latency and availability.

→ Key insight

Partitioning and replication are different axes: shard for scale, replicate for survival, and use quorums to choose how consistent each operation must be. When you need one agreed value rather than a tunable one — a single leader, a committed transaction — you've crossed into consensus. The consistency you're tuning here is defined in CAP & consistency models.

RUN IT YOURSELF

Count the keys that move when a node joins

Why bother with a hash ring instead of hash(key) % N? Because when N changes, the modulo scheme reshuffles almost everything. This assigns 5,000 keys both ways, adds a fifth node to a four-node cluster, and counts the moves: the naive scheme relocates ~80% of the keys — a resharding storm — while consistent hashing moves only the new node's ~24% share. It also shows that removing a node leaves every other node's keys untouched. Change the key count, nodes, or virtual-node factor.

CPython · WebAssembly

import zlib
def h(s): return zlib.crc32(str(s).encode())

# naive: hash(key) % number_of_nodes
def naive_owner(key, nodes): return nodes[h(key) % len(nodes)]
def naive_moves(keys, nodes, new_nodes):
    return sum(1 for k in keys if naive_owner(k, nodes) != naive_owner(k, new_nodes))

# consistent hashing: nodes on a ring, virtual nodes for balance
def build_ring(nodes, vnodes=60):
    return sorted((h(f"{n}#{v}"), n) for n in nodes for v in range(vnodes))
def ch_owner(key, ring):
    kh = h(key)
    for pos, node in ring:
        if pos >= kh: return node
    return ring[0][1]                            # wrap around
def ch_moves(keys, ring, new_ring):
    return sum(1 for k in keys if ch_owner(k, ring) != ch_owner(k, new_ring))

K = 5000
keys = [f"key-{i}" for i in range(K)]
N4, N5 = ["A", "B", "C", "D"], ["A", "B", "C", "D", "E"]

nm = naive_moves(keys, N4, N5)
cm = ch_moves(keys, build_ring(N4), build_ring(N5))
print(f"add a 5th node ({K:,} keys):")
print(f"  naive hash % N : {nm:>5,} move ({nm/K:.0%})  -> resharding storm")
print(f"  consistent hash: {cm:>5,} move ({cm/K:.0%})  -> only the new node's share")
print(f"  consistent hashing moves {nm/cm:.0f}x fewer keys.")

See partitioning & replication in real designs

Twitter — fan-out & sharding Uber — geo-sharding Key-Value Store — consistent hashing ID Generator — partitioned IDs Search Engine — index shards Instagram — sharded media

Frequently asked

Quick answers

Partitioning vs replication?

Partitioning splits a dataset across machines to scale capacity. Replication copies the same data onto several machines for fault tolerance and read scaling. Real systems shard for scale and replicate each shard for durability.

What is consistent hashing?

Keys and nodes map onto a ring; a key is owned by the next node clockwise. Adding/removing a node moves only a small fraction of keys instead of remapping everything. Virtual nodes even out the load.

What is a hot partition?

A shard receiving disproportionate traffic due to a skewed shard key (e.g. a celebrity). Fixes: a higher-cardinality key, a random suffix to spread the hot key, or separate caching.

What is replication lag?

The delay between a write committing on the leader and appearing on a follower. With async replication, reads from a lagging follower can be stale, breaking read-your-writes unless you route reads to a caught-up replica.

▶ Watch it explained

Partitioning vs replication: how databases scale

Finished this one? 0 / 208 Handbooks done

Explore the topic

See this alongside everything else on the same subject — handbooks, system designs, challenges and tools, in one place.

Distributed Systems Databases & Storage

Split the data to scale.
Copy it to survive.

Two questions, not one

How to partition

Consistent hashing

Hot partitions & the shard key

Bad shard key

Good shard key

Replication topologies

Quorums tie it together

Count the keys that move when a node joins

Quick answers

Partitioning vs replication?

What is consistent hashing?

What is a hot partition?

What is replication lag?

Partitioning vs replication: how databases scale

Explore the topic

More Handbooks

Explore more from Vibe Engines

Split the data to scale.Copy it to survive.

Two questions, not one

How to partition

Consistent hashing

Hot partitions & the shard key

Bad shard key

Good shard key

Replication topologies

Quorums tie it together

Count the keys that move when a node joins

Quick answers

Partitioning vs replication?

What is consistent hashing?

What is a hot partition?

What is replication lag?

Partitioning vs replication: how databases scale

Explore the topic

More Handbooks

Explore more from Vibe Engines

Get the next one in your inbox.

Split the data to scale.
Copy it to survive.