CODING CHALLENGE · N°06

Top-K Retrieval

Medium AI EngineeringRAGRetrievalEmbeddings

The core of the "R" in RAG: given a query embedding and a set of document embeddings, return the indices of the k most similar documents. Score every doc by cosine similarity, then take the top k — with a stable tie-break so results are deterministic.

The problem

Given a query vector, a list of docs vectors (all the same length as the query), and an integer k, return the indices of the k documents most similar to the query by cosine similarity, ordered from most to least similar. Break ties by smaller index first. (cosine_similarity is already provided.)

EXAMPLE 1

Input

query = [1, 0]
docs = [[1, 0], [0, 1], [1, 1]]
k = 2

Output [0, 2]

doc0 is identical (1.0), doc2 is 0.707, doc1 is 0

EXAMPLE 2

Input k = 1, same docs

Output [0]

just the best match

EXAMPLE 3

Input k ≥ len(docs)

Output all indices, ranked

never return more than exists

CONSTRAINTS

0 ≤ k; if k exceeds the number of docs, return all of them ranked.
Deterministic: ties broken by ascending index.
Reuse the provided cosine_similarity — do not re-derive it.

SOLVE IT YOURSELF

Your turn — write it

Edit the stub, hit Run (or ⌘/Ctrl + Enter), and watch the hidden tests. Stuck? the hints are right above and Reveal solution is one click away.

YOUR TASK

Implement top_k(query, docs, k): score each doc against the query with the provided cosine_similarity, then return the indices of the top k, most-similar first, ties broken by smaller index.

HINTS — 4 IDEAS

Score every document: build a list of (similarity, index) pairs.
Sort by similarity descending. For ties, the smaller index should come first.
A single sort key of (-similarity, index) gives exactly that ordering.
Take the first k indices — slicing past the end is fine, it just returns what exists.

CPython · WebAssembly