System Design · step by stepDesign a RAG Pipeline
Step 1 / 9
RUN IT YOURSELF

Retrieval by cosine similarity

RAG finds the most relevant chunks for a query by comparing embedding vectors with cosine similarity, then feeds the top-k to the LLM. Here is that retrieval core in real Python, running live. Read the comments, edit the vectors, and hit Run.

HOW TO READ THE CODE — 4 IDEAS
  1. Text becomes a vector (embedding); similar meaning → similar direction.
  2. Cosine similarity measures the angle between two vectors, ignoring length (steps 1–2).
  3. Score every document against the query, then take the top-k (step 3).
  4. Those k chunks are what actually get stuffed into the LLM prompt.
CPython · WebAssembly
built to be reasoned about, not memorized — make the calls, poison the index, run the quiz.
Finished this one? 0 / 5 AI System Designs done

Explore the topic

See this alongside everything else on the same subject — handbooks, system designs, challenges and tools, in one place.

More AI System Designs