CODING CHALLENGE · N°06

Top-K Retrieval

Medium AI EngineeringRAGRetrievalEmbeddings

The core of the "R" in RAG: given a query embedding and a set of document embeddings, return the indices of the k most similar documents. Score every doc by cosine similarity, then take the top k — with a stable tie-break so results are deterministic.

The problem

Given a query vector, a list of docs vectors (all the same length as the query), and an integer k, return the indices of the k documents most similar to the query by cosine similarity, ordered from most to least similar. Break ties by smaller index first. (cosine_similarity is already provided.)

EXAMPLE 1
Input query = [1, 0] docs = [[1, 0], [0, 1], [1, 1]] k = 2
Output [0, 2]
doc0 is identical (1.0), doc2 is 0.707, doc1 is 0
EXAMPLE 2
Input k = 1, same docs
Output [0]
just the best match
EXAMPLE 3
Input k ≥ len(docs)
Output all indices, ranked
never return more than exists
CONSTRAINTS
  • 0 ≤ k; if k exceeds the number of docs, return all of them ranked.
  • Deterministic: ties broken by ascending index.
  • Reuse the provided cosine_similarity — do not re-derive it.
SOLVE IT YOURSELF

Your turn — write it

Edit the stub, hit Run (or ⌘/Ctrl + Enter), and watch the hidden tests. Stuck? the hints are right above and Reveal solution is one click away.

YOUR TASK

Implement top_k(query, docs, k): score each doc against the query with the provided cosine_similarity, then return the indices of the top k, most-similar first, ties broken by smaller index.

HINTS — 4 IDEAS
  1. Score every document: build a list of (similarity, index) pairs.
  2. Sort by similarity descending. For ties, the smaller index should come first.
  3. A single sort key of (-similarity, index) gives exactly that ordering.
  4. Take the first k indices — slicing past the end is fine, it just returns what exists.
CPython · WebAssembly