RAG & Retrieval
Retrieval-augmented generation end to end — chunking, embeddings, similarity search and ranking — across a system design, a hands-on tool, and runnable challenges.
System Designs 1
AI System Designs 2
Design a Conversational AI
Build a production conversational AI system (think ChatGPT) step by step. See how the request path splits an inference gateway from the model servers, how the context window is assembled and token-budgeted, how conversation memory is stored and recalled, how tokens stream back over a persistent connection, and how guardrails gate every prompt and response — through an interactive diagram that grows with each concept.
Design a RAG Pipeline
Build a retrieval-augmented generation pipeline step by step. See how documents are chunked and embedded, how a vector store answers semantic search, how two-stage retrieval with reranking finds the best passages, how the prompt is grounded to stop hallucination, and how evals keep a quietly-drifting index honest — through an interactive diagram that grows with each concept.
Coding Challenges 2
Cosine Similarity
The measure behind every embedding search and RAG system: how aligned are two vectors, ignoring their length? Dot product over the product of magnitudes — 1 identical, 0 orthogonal, -1 opposite. Solve it in Python or TypeScript.
Top-K Retrieval
The core of the "R" in RAG: given a query embedding and a set of document embeddings, return the indices of the k most similar docs by cosine similarity, with a stable tie-break. Solve it in Python or TypeScript.