Context Budget & Cost Planner
Add a system prompt, tool definitions, conversation history and retrieved context, then see your context window fill up and the cost per call — plus the bill at 1k and 1M requests — across model price tiers. An architecture planner, not a toy token counter.
How it works
- Each turn re-sends system prompt + tools + full history + context as input.
- Cost/call = input tokens × input price + output tokens × output price.
- History grows every turn — long agent loops scale super-linearly in cost.
- Token estimates use ~4 chars/token; set your own model price tier.
Frequently asked questions
How is LLM API cost calculated?
Cost per call = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). Input tokens include everything you send each turn — system prompt, tool definitions, conversation history and any retrieved context — which is why long agent loops get expensive fast.
Why does conversation history dominate agent cost?
On every turn you resend the entire history as input. A 10-turn agent reprocesses turns 1–9 again at turn 10, so input tokens grow with the square of the conversation length. Trimming or summarizing history is often the biggest single cost lever.
What counts toward the context window?
The system prompt, every tool/function definition, the full message history, and any RAG or document context — all of it shares one window. This planner shows each contributor separately so you can see what is crowding the budget.
Are the token counts exact?
They are estimates (roughly 4 characters per token), which is accurate enough for planning and comparing options. For exact billing-grade counts, use the official tokenizer or token-counting endpoint of your specific model.