
A token is the smallest unit of text that a Large Language Model processes. Tokenizers split text into subword pieces — roughly 4 characters or 0.75 words in English, though the ratio varies across languages and character sets. The word "understanding" might become two tokens ("under" + "standing"), while common words like "the" are a single token. Every interaction with an LLM is measured in tokens: the input prompt, the generated output, and the total context window all have token-based limits and pricing. Understanding tokens is fundamental to working with any LLM because they are the unit of both cost and capacity.
Why it matters
Tokens are the primary cost driver for LLM usage. API providers charge per token — for example, a few dollars per million input tokens and more per million output tokens. A seemingly small prompt optimization that reduces token count by 30% translates directly to 30% lower costs at scale. Tokens also determine what fits in a model's context window: a 200K-token window sounds enormous until you realize a single technical manual might consume 80K tokens, leaving limited room for instructions and conversation history. For any AI application handling significant volume, token management is the difference between a viable product and an unsustainable cost structure.
How it works
LLMs use tokenizers — algorithms that break text into a vocabulary of subword pieces. The most common approach is Byte Pair Encoding (BPE), which iteratively merges the most frequent character pairs to build a vocabulary of typically 30,000 to 100,000 tokens. Common words become single tokens, while rare words are split into multiple subword pieces. Numbers, code, and non-English text often tokenize less efficiently, using more tokens per character. The tokenizer converts text to a sequence of token IDs (integers), which become the actual input to the neural network. Each token ID maps to an embedding vector that the model processes. Understanding tokenization explains why the same content in different languages can have very different token counts — and therefore different costs.
Example
A SaaS company building an AI customer support agent discovers their average conversation uses 4,200 tokens (1,800 input + 2,400 output). At 10,000 conversations per day, that is 42 million tokens daily. By restructuring their system prompt from a verbose 800-token instruction set to a concise 350-token version, switching from full conversation history to a summarized 5-message sliding window, and implementing response length guidelines, they reduce average token usage to 2,600 per conversation — a 38% reduction that saves over €15,000 per month on API costs while maintaining the same response quality.