Design the systems
that run the models.
Build real AI architectures step by step — inference serving, context windows, RAG retrieval, agent loops and guardrails — through interactive diagrams that grow one concept at a time. The LLM-era counterpart to classic system design.
The ideas behind every AI system
Every design below recombines the same LLM-era building blocks — tokens and context windows, retrieval, tool-calling and evals. Get oriented with our handbooks and AI interactive tools, then spot the patterns in each design.
RAG Chunking Playground
Tune chunk size & overlap and see retrieval quality move.
01Context & Cost Planner
Budget tokens across prompt, history and output.
02Tool Schema Designer
Shape the JSON schemas an agent calls tools with.
03LLM Judge Builder
Build an eval rubric to score non-deterministic output.
Browse all AI designs
5 designsDesign a Conversational AI
Build a production conversational AI system (think ChatGPT) step by step. See how the request path splits an inference gateway from the model servers, how the context window is assembled and token-budgeted, how conversation memory is stored and recalled, how tokens stream back over a persistent connection, and how guardrails gate every prompt and response — through an interactive diagram that grows with each concept.
Design a RAG Pipeline
Build a retrieval-augmented generation pipeline step by step. See how documents are chunked and embedded, how a vector store answers semantic search, how two-stage retrieval with reranking finds the best passages, how the prompt is grounded to stop hallucination, and how evals keep a quietly-drifting index honest — through an interactive diagram that grows with each concept.
Design an AI Agent System
Build an autonomous AI agent step by step. See how the plan-act-observe loop turns a goal into action, how the model emits typed tool calls, how a sandboxed executor runs them safely, how working and long-term memory fit together, and how budgets and approval gates keep a multi-step agent from running away — through an interactive diagram that grows with each concept.
Design an LLM Inference Server
Build an LLM inference serving system step by step. See how a request queue absorbs spiky traffic, how the prefill/decode split and continuous batching keep GPUs full, how the KV cache and paged attention make each token cheap, how tensor sharding fits a giant model, and how autoscaling rides demand — all balancing latency against throughput, through an interactive diagram that grows with each concept.
Design a Recommendation System
Build a large-scale recommendation system step by step. See how a two-stage retrieve-and-rank funnel picks the best few from millions, how two-tower embeddings and ANN generate candidates fast, how a heavy ranking model scores engagement, how a feature store stays consistent between training and serving, and how the feedback loop keeps recommendations fresh — through an interactive diagram that grows with each concept.
AI system design — frequently asked questions
What is AI system design?
AI system design is the practice of architecting production systems built around large language models and other AI — deciding how to serve inference, assemble and budget the context window, retrieve knowledge (RAG), store conversation memory, orchestrate tool-calling agents, and add guardrails and evals. It shares the rigor of classic system design but foregrounds LLM-specific concerns like tokens, latency-vs-cost, and non-determinism.
How is this different from classic system design?
Classic system design is about distributing data and traffic — caching, sharding, fan-out, consistency. AI system design adds a new axis: token budgets and context windows, vector retrieval quality, streaming token delivery, GPU inference serving, prompt and output guardrails, and evaluating non-deterministic model output. Many building blocks carry over (queues, caches, replicas); the trade-offs are new.
Are the AI system-design guides free?
Yes. Every guide is free, self-contained, and runs in your browser with no sign-up. You build each system step by step through an interactive diagram.
Which AI system design should I start with?
Start with the Conversational AI design. It establishes the core LLM serving path — inference gateway, context assembly and token budgeting, conversation memory, streaming, and guardrails — that every other AI system builds on.
Will these help with AI engineering interviews?
Yes. As teams ship LLM features, interviews increasingly probe how you would design a chat system, a RAG pipeline, or an agent — how you manage context, control cost and latency, retrieve reliably, and keep output safe. These guides foreground exactly those trade-offs.
Pick a system. Start building.
Every guide is interactive and self-contained — no setup, no sign-up. Start with the Conversational AI build, or explore the classic system-design library.