Softmax
The function at the end of every classifier and language model: turn a vector of raw scores (logits) into a probability distribution. The catch every AI engineer learns the hard way — do it the numerically stable way so big logits do not overflow.
The problem
Given a list of logits (raw real-valued scores), return the softmax: a list of the same length where each value is exp(logit) / Σ exp(logits). The result is a probability distribution — every value in [0, 1] and the whole thing sums to 1. Subtract the max logit before exponentiating so large values do not overflow (this does not change the result).
logits = [0, 0][0.5, 0.5]logits = [2, 1, 0][0.665, 0.245, 0.090]logits = [1000, 1000][0.5, 0.5]- 1 ≤ len(logits)
- The output must sum to 1 (within floating-point tolerance).
- Subtract
max(logits)beforeexp— the numerically stable softmax.
Your turn — write it
Edit the stub, hit Run (or ⌘/Ctrl + Enter), and watch the hidden tests. Stuck? the hints are right above and Reveal solution is one click away.
Implement softmax(logits) → a probability distribution over the logits. Subtract the max first for numerical stability, exponentiate, then divide by the sum.
- Probabilities must be positive and sum to 1 — exponentiate, then normalize by the total.
- Find
m = max(logits)and exponentiatelogit - m. Shifting by a constant cancels out in the ratio but stopsexpfrom overflowing. - Sum the exponentials once, then divide each by that sum.
- Sanity check: all-equal logits give a uniform distribution.