Question 1

How is LLM API cost calculated?

Accepted Answer

Cost per call = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). Input tokens include everything you send each turn — system prompt, tool definitions, conversation history and any retrieved context — which is why long agent loops get expensive fast.

Question 2

Why does conversation history dominate agent cost?

Accepted Answer

On every turn you resend the entire history as input. A 10-turn agent reprocesses turns 1–9 again at turn 10, so input tokens grow with the square of the conversation length. Trimming or summarizing history is often the biggest single cost lever.

Question 3

What counts toward the context window?

Accepted Answer

The system prompt, every tool/function definition, the full message history, and any RAG or document context — all of it shares one window. This planner shows each contributor separately so you can see what is crowding the budget.

Question 4

Are the token counts exact?

Accepted Answer

They are estimates (roughly 4 characters per token), which is accurate enough for planning and comparing options. For exact billing-grade counts, use the official tokenizer or token-counting endpoint of your specific model.

Context Budget & Cost Planner

The context window is a budget

Where the tokens go

Levers that actually move cost

How it works

Frequently asked questions

The context window is a budget

Where the tokens go

Levers that actually move cost

How it works

Frequently asked questions

Keep going