LEARNING PATH · AI Engineering
Master LLM Evaluation
For engineers who need to prove their AI actually works.
Shipping an LLM feature is easy; knowing whether it is good is the hard part. Build the vocabulary of evals, design a judge rubric, implement a metric by hand, then evaluate agents and a real production system.
- Speak precisely about offline vs online evals, judges and rubrics
- Design an LLM-as-judge rubric that resists gaming
- Implement eval metrics like token-level F1 from scratch
- Evaluate agents and conversational systems, not just single calls
0 / 5 done · 0%
51 LLM Evals Interview Questions
The vocabulary of evals — start here.
LLM-as-Judge Rubric Builder
Design an LLM-as-judge rubric interactively.
Token-Level F1
Implement a real metric by hand, with tests.
The Agent Evaluations Handbook
Evaluate agents — multi-step, tool-using.
Design an AI Agent System
The agent system your evals must judge.
The Senior AI Engineer Interview Handbook
Tie evals into senior-scope AI engineering.