AI System Design · 61 Interactive Builds

Design the systems
that run the models.

Build real AI architectures step by step — inference serving, context windows, RAG retrieval, agent loops and guardrails — through interactive diagrams that grow one concept at a time. The LLM-era counterpart to classic system design.

0Interactive designs

0Core concepts

0+Guided steps

Explore the catalog Classic system design

Before you build

The ideas behind every AI system

Every design below recombines the same LLM-era building blocks — tokens and context windows, retrieval, tool-calling and evals. Get oriented with our handbooks and AI interactive tools, then spot the patterns in each design.

Start here

RAG Chunking Playground

Tune chunk size & overlap and see retrieval quality move.

Context & Cost Planner

Budget tokens across prompt, history and output.

Tool Schema Designer

Shape the JSON schemas an agent calls tools with.

LLM Judge Builder

Build an eval rubric to score non-deterministic output.

Browse all AI designs

61 designs

01 Intermediate

★ Start here

Design a Conversational AI

Build a production conversational AI system (think ChatGPT). See how the request path splits an inference gateway from the model servers, how the context window is assembled and token-budgeted, how conversation memory is stored and recalled, how tokens stream back over a persistent connection, and how guardrails gate every prompt and response.

LLMInferenceStreaming

9 steps →

02 Intermediate

Design a RAG Pipeline

Build a retrieval-augmented generation pipeline. See how documents are chunked and embedded, how a vector store answers semantic search, how two-stage retrieval with reranking finds the best passages, how the prompt is grounded to stop hallucination, and how evals keep a quietly-drifting index honest.

RAGRetrievalEmbeddings

9 steps →

03 Advanced

Design an AI Agent System

Build an autonomous AI agent. See how the plan-act-observe loop turns a goal into action, how the model emits typed tool calls, how a sandboxed executor runs them safely, how working and long-term memory fit together, and how budgets and approval gates keep a multi-step agent from running away.

AgentsTool CallingOrchestration

9 steps →

04 Advanced

Design an LLM Inference Server

Build an LLM inference serving system. See how a request queue absorbs spiky traffic, how the prefill/decode split and continuous batching keep GPUs full, how the KV cache and paged attention make each token cheap, how tensor sharding fits a giant model, and how autoscaling rides demand — all balancing latency against throughput,.

InferenceGPUScalability

9 steps →

05 Advanced

Design a Recommendation System

Build a large-scale recommendation system. See how a two-stage retrieve-and-rank funnel picks the best few from millions, how two-tower embeddings and ANN generate candidates fast, how a heavy ranking model scores engagement, how a feature store stays consistent between training and serving, and how the feedback loop keeps recommendations fresh.

RecommendersRankingEmbeddings

9 steps →

06 Advanced

Design a Vector Database

Build a vector database. See why "k nearest of a billion vectors" needs its own index, how a distance metric ranks similarity, how IVF cells and an HNSW graph make search sub-linear, how product quantization fits billions in RAM, how metadata filtering and sharding hold up — and how ANN fails silently when you starve the search.

RetrievalEmbeddingsANN

9 steps →

07 Intermediate

Design Semantic Search

Build a semantic search engine. See why keyword search misses meaning, how a single shared embedding model puts documents and queries in one space, how the chunk→embed→index ingest path is built, how hybrid BM25 + vector fusion catches exact terms, how a cross-encoder reranks the shortlist — and how a model-version upgrade silently randomizes results.

RetrievalEmbeddingsSearch

9 steps →

08 Intermediate

Design an LLM Gateway

Build an LLM gateway and model router. See why apps should call one provider-agnostic API instead of vendor SDKs, how adapters normalize every provider, how a capability-first router picks a model, how retries and failover survive a provider outage, how per-tenant limits and budgets isolate a shared quota, how caching cuts cost and latency — and how cost-only routing silently wrecks quality.

LLMRoutingInfrastructure

9 steps →

09 Intermediate

Design an LLM Cache

Build a prompt & response caching layer for LLMs. See why repeated and near-duplicate prompts should skip the model, how an exact-match cache keys on the normalized request, how a semantic cache reuses answers above a tuned similarity threshold, how prompt-prefix (KV) reuse cheapens even misses, how TTL and invalidation keep it fresh — and how a loose threshold silently serves wrong answers.

LLMCachingInfrastructure

7 steps →

10 Advanced

Design a Fine-Tuning Pipeline

Build an LLM fine-tuning and training pipeline. See how datasets are curated (the real work), how a pretrained base is adapted with LoRA/PEFT, how the distributed training loop is monitored, how a held-out eval gate decides promotion, how a versioned model registry enables rollback, how canary deploys ship safely, and how the production data flywheel compounds — plus why eval-set contamination inflates metrics silently — through an interactive diagram.

LLMTrainingMLOps

9 steps →

11 Advanced

Design a Feature Store

Build a feature store for ML. See how one feature definition kills training/serving skew, how a shared pipeline computes features once, how the offline store serves point-in-time-correct training data, how the online store serves millisecond lookups, how a versioned registry makes features reusable assets, how batch + streaming keep them fresh, and how drift monitoring catches silent decay.

MLFeature StoreRecommenders

8 steps →

12 Advanced

Design Content Moderation

Build a content-moderation pipeline (text + image). See how a staged funnel hash-matches known-bad content, how text and multimodal classifiers emit per-category scores, how OCR closes the text-in-image loophole, how a policy engine maps scores to graduated actions, how a human review queue handles the uncertain middle, how appeals and a retraining loop fight adversarial evasion — and why fully trusting the classifier fails — through an interactive diagram.

SafetyModerationLLM

9 steps →

13 Advanced

Design an AI Coding Assistant

Build an AI coding assistant (like Copilot or Cursor). See how inline completion / tab prediction meets a sub-second latency budget, how context is assembled with fill-in-the-middle, how repo-aware retrieval grounds completions in the codebase, how a code-specialized model streams suggestions, how debounce/cancel/cache win the milliseconds, how acceptance-rate telemetry measures quality, how an agentic chat mode handles multi-file edits, and why starving the context yields confident wrong code — through an interactive diagram.

LLMRetrievalDeveloper Tools

9 steps →

14 Advanced

Design Multi-Agent Orchestration

Build a multi-agent orchestration / workflow engine. See how an orchestrator decomposes a goal across specialist agents, when NOT to go multi-agent, how shared state coordinates them, how a durable workflow engine checkpoints and resumes, how typed handoffs stop telephone-game degradation, how fan-out/fan-in parallelizes independent work, how budgets and termination prevent runaways, and why uncapped agents loop and burn unbounded cost — through an interactive diagram.

AgentsOrchestrationLLM

9 steps →

15 Advanced

Design LLM Eval & Observability

Build an LLM evaluation and observability pipeline. See how offline evals and online observability form two loops, how tracing every call is the foundation, how versioned golden datasets become your real benchmark, how programmatic/LLM-judge/human scorers combine, how a CI regression gate blocks regressions, how production monitoring catches drift, how failures feed back as fixtures — and why an unvalidated LLM judge silently corrupts every metric — through an interactive diagram.

EvalsLLMObservability

9 steps →

16 Advanced

Design an LLM Guardrails System

Build an LLM guardrails / safety-filter system. See how a safety pipeline wraps every call, how input guardrails screen PII, prompt-injection and scope, why injection needs defense-in-depth, how output guardrails check toxicity, PII, schema and groundedness, how violations are handled gracefully, how cheap-first layering keeps it affordable, why both over- and under-blocking fail — and how trusting untrusted input lets injection through — through an interactive diagram.

SafetyGuardrailsLLM

9 steps →

17 Advanced

Design a Text-to-Image Service

Build a text-to-image generation service like Midjourney, DALL·E or hosted Stable Diffusion. See why synchronous generation fails, how the async job pattern accepts fast and works later, how a queue absorbs bursts against fixed GPU capacity, what the diffusion denoising loop actually does, how batching and autoscaling keep the GPU fleet cost-effective, how object storage + CDN deliver images, and why moderation is required on both the prompt and the image.

The ideas behind every AI system

RAG Chunking Playground

Context & Cost Planner

Tool Schema Designer

LLM Judge Builder

Browse all AI designs

Design a Conversational AI

Design a RAG Pipeline

Design an AI Agent System

Design an LLM Inference Server

Design a Recommendation System

Design a Vector Database

Design Semantic Search

Design an LLM Gateway

Design an LLM Cache

Design a Fine-Tuning Pipeline

Design a Feature Store

Design Content Moderation

Design an AI Coding Assistant

Design Multi-Agent Orchestration

Design LLM Eval & Observability

Design an LLM Guardrails System

Design a Text-to-Image Service

Design a Speech-to-Text Service

Design a Text-to-Speech Service

Design a Realtime Voice Agent

Design an AI Answer Engine

Design an Embeddings Service

Design a Reranking Service

Design a GraphRAG System

Design a Multimodal RAG System

Design an Agent Memory System

Design a Code Execution Sandbox

Design an LLM Router

Design a Batch Inference System

Design a Synthetic Data Pipeline

Design an RLHF Pipeline

Design a Document AI Pipeline

Design ChatGPT

Design a Computer-Use Agent

Design a Deep Research Agent

Design an AI Meeting Notetaker

Design a Text-to-Video System

Design an Adaptive AI Tutor

Design an AI Code Review Bot

Design an Agent Control Plane

Design a Model Registry (MLOps)

Design a GPU Cluster Scheduler

Design a Data Labeling Platform

Design a Prompt Management & A/B Testing Platform

Design Realtime Speech Translation

Design an AI Ad Creative Platform

Design a Multi-Tenant AI SaaS Platform

Design an Autonomous Email Agent

Design a Sales CRM Agent

Design a Support Resolution Agent

Design a CI Test-Generation Agent

Design a Personal Knowledge Assistant

Design an Edge Inference Fleet

Design an AI Video Dubbing System

Deploy an LLM in a Customer’s Environment

Design Secure Document Ingestion + RAG

Design an Agent Payment Gateway

Design an MCP Security Gateway

Design an Agentic Browser Security Gateway

Design a Hybrid Edge-Cloud Agent

Design an RL Environment Farm

AI system design — frequently asked questions

Pick a system. Start building.