Top-K Retrieval
The core of the "R" in RAG: given a query embedding and a set of document embeddings, return the indices of the k most similar documents. Score every doc by cosine similarity, then take the top k — with a stable tie-break so results are deterministic.
The problem
Given a query vector, a list of docs vectors (all the same length as the query), and an integer k, return the indices of the k documents most similar to the query by cosine similarity, ordered from most to least similar. Break ties by smaller index first. (cosine_similarity is already provided.)
query = [1, 0]
docs = [[1, 0], [0, 1], [1, 1]]
k = 2[0, 2]k = 1, same docs[0]k ≥ len(docs)all indices, ranked- 0 ≤ k; if k exceeds the number of docs, return all of them ranked.
- Deterministic: ties broken by ascending index.
- Reuse the provided
cosine_similarity— do not re-derive it.
Your turn — write it
Edit the stub, hit Run (or ⌘/Ctrl + Enter), and watch the hidden tests. Stuck? the hints are right above and Reveal solution is one click away.
Implement top_k(query, docs, k): score each doc against the query with the provided cosine_similarity, then return the indices of the top k, most-similar first, ties broken by smaller index.
- Score every document: build a list of
(similarity, index)pairs. - Sort by similarity descending. For ties, the smaller index should come first.
- A single sort key of
(-similarity, index)gives exactly that ordering. - Take the first
kindices — slicing past the end is fine, it just returns what exists.