What is a token in AI ? Prompt examples

What is a token in AI ? – A piece of text, usually a word that we send to the LLM. The same goes for when we get a response ( usually number of words = number of tokens, roughly). How to use it ? Since we pay for it , rather cautiously. We want to send the minimum and get the maximum out of every request. Pretty much the basics of economics.

Below are couple of prompts i used to analyze my tokens usage. Something are obsiously hallucinations but on the other hand we get a prety decent breakdown of all the data i did send for that coding session.

Table of Contents

Tokens in AI and how to optimize it ? TL;DR

Context files (52k tokens) = biggest impact on quality (95%)
Cost is always linear, quality is never linear
First 10k tokens provide 80% of value
Specificity > Length for prompts
MCP tools add minimal cost but huge value
Reset sessions every 20-30 exchanges for efficiency ( problem in the middle )
3-5 key files > entire project dump
Output costs 5× more than input
Pareto principle: 20% of best context provides 80% of quality

Average cost per prompt

Remember that more complex prompt will make more work done with higher precision and should / will save You time in the long run.

┌────────────────────────────────────────────────────────┐
│ Prompt Type │ Tokens │ Cost │
├────────────────────────────────────────────────────────┤
│ Simple (“fix this bug”) │ ~1,500 │ ~$0.016 (6¢) │
│ Medium (code + analysis) │ ~5,000 │ ~$0.039 (16¢) │
│ Complex (your avg) │ ~6,500 │ ~$0.050 (20¢) │
│ Very complex (detailed) │ ~7,000 │ ~$0.081 (33¢) │
└────────────────────────────────────────────────────────┘

Remember to not burn tokens 🙂 Most of LLM providers are starting to charge per request. It will push all of us to be very pricise and make more effort with the “discussion” quality with LLMs.

Token Usage & Breakdown

Full Prompt:
“Can you show me current token usage and how many tokens are used by the context window, instructions, my prompts and MCP servers”

Summary:

Context files consume 52% of tokens
Output costs 5× more than input
Average cost per exchange: $0.020
Tip: Attach only 3–5 key files, not the entire project.

Token Pricing & Average Cost

Full Prompt:
“What is the price of the tokens, what would be the average price for a prompt?”

┌──────────────────────────────────────────────────────────┐
│ Model │ Input ($/1M) │ Output ($/1M) │
├──────────────────────────────────────────────────────────┤
│ Claude 3.5 Sonnet │ $3.00 │ $15.00 ← Used │
│ Claude 3 Opus │ $15.00 │ $75.00 │
│ GPT-4 Turbo │ $10.00 │ $30.00 │
│ GPT-3.5 Turbo │ $0.50 │ $1.50 │
└──────────────────────────────────────────────────────────┘

Input tokens: ~70,000 × $0.000003 = $0.21
Output tokens: ~24,000 × $0.000015 = $0.36

Summary:

Claude 3.5 Sonnet: $3 input / $15 output per 1M tokens
Your session: $0.57 (~2.30 PLN)
ROI: 7,000–21,000% compared to developer time saved.
LOL LOL – Claude went over the border on this one 😀
Rule: Starbucks coffee costs 9× more than your AI session.

Tokenization Example: “Hello W”

Full Prompt:
“Can you show me how you would tokenize the prompt ‘Hello W’”

Summary:

“Hello W” → [9906, 468] (2 tokens)
Cost: $0.000006
Insight: Tokenizers combine frequent patterns like “ W”.

Top 5 Predictions: “Hello …”

Full Prompt:
“Show me the most probable returns (top 5) for the prompt ‘Hello …’. Tell me what would be instead of ‘…’, if you can estimate the probability for each response”

Will add that any diactric signs are an additional token but they do help to establish an answer in the language we are using. Remember it is all based on probability.
Most likely completions:

Hello World (45%)
Hello there (25%)
Hello! (15%)
Hello, how are you? (10%)
Hello [Name] (5%)

Context Impact Analysis

Full Prompt:
“Can you show me a table what makes the biggest impact on the prompt for my current setup. Is it linearly dependent of the amount of input from prompt / instructions / MCP servers or tools?”

This should show us where do we ‘leak’ tokens and what part of our prompting / vibing we should focus on.

Formula: quality = 100 × (1 – e^(-context/k))
Impact
^
100% | ████████████████████
90%| ██████
80%| █████
70%| ████
60%| ███
50%| ██
40%| █
30%|█
20%|
10%|
0%+────────────────────────────────────> Files
0 1 2 3 4 5 6 7 8 9 10 11 12

Cost = linear, Quality = non-linear
First 10k tokens → 80% value
Next 36k → +15% only
Pareto: 20% of best context gives 80% quality.

Summary

Context files are essential for quality, more than anything else. In this example around 52,000 tokens can delivers 955 of solution. While cost scales linearly, quality does not. However first 10,000 tokens give you 80% of the value, and after that, returns diminish quickly. Remember that when You are crafting prompts, specificity beats length every time. MCP tools are a secret weapon ( Context7). Reset your sessions every 20 to 30 exchanges or a new topic is required. Focus on attaching three to five key files instead of dumping the whole lot. And the most universal of them all, the pareto rule.

Remember that no matter what, common sense is the best approach 🙂

Tokens in AI and how to optimize it ? TL;DR

Average cost per prompt

Token Usage & Breakdown

Token Pricing & Average Cost

Tokenization Example: “Hello W”

Top 5 Predictions: “Hello …”

Context Impact Analysis

Summary

You May Also Like

How to use Context7 mcp server

Copium – the Psychology of why Some People Can’t Accept “No”

Why people do not use Dvorak