
Token economics refers to the pricing model and cost structure of Large Language Model usage, where costs are primarily determined by the number of tokens processed (input) and generated (output). Every commercial LLM API charges per token — with prices varying dramatically by model capability, from fractions of a cent per million tokens for lightweight models to several dollars per million for frontier models. Token economics also distinguishes between input tokens (the prompt, context, and system instructions) and output tokens (the generated response), with output tokens typically costing 2-5× more because they require sequential generation. Understanding token economics is essential for budgeting AI deployments, optimizing costs, and making informed build-versus-buy decisions.
Why it matters
Token economics determines whether an AI application is financially viable at scale. A proof of concept that costs €5 per day might scale to €50,000 per month when deployed to all customers — and the difference between success and failure often comes down to token optimization. Input tokens are cheaper but accumulate fastest (system prompts, RAG context, conversation history are repeated every request). Output tokens cost more per unit but are typically fewer. Cache hits (when the API provider has already processed the same prefix tokens) can reduce input costs by 50-90%. Understanding these dynamics enables three types of optimization: prompt optimization (reducing token count while maintaining quality), model tiering (using cheaper models for simple tasks), and architectural choices (batching, caching, context compression). For finance teams evaluating AI investments, token economics provides the cost model needed for accurate ROI calculations.
How it works
LLM API pricing follows a pay-per-token model with several tiers. Providers typically publish prices per million tokens, split into input and output rates. For example, a frontier model might charge $15 per million input tokens and $75 per million output tokens, while a smaller model from the same provider charges $0.25 and $1.25 respectively — a 60× price difference for tasks where the smaller model is sufficient. Additional economic factors include: prompt caching (repeated prompt prefixes cached server-side at reduced rates), batch processing (submitting requests in bulk at 50% discount for non-time-sensitive tasks), and fine-tuned model pricing (training costs plus elevated inference costs). The total cost of an AI feature depends on: average tokens per request × requests per day × price per token, multiplied across all model calls in the pipeline. Multi-model architectures reduce costs by routing different subtasks to appropriately-sized models.
Example
A B2B SaaS company is building an AI feature that generates custom reports from customer data. The initial implementation uses a frontier model for everything: parsing the customer query (500 input + 50 output tokens), retrieving and analyzing relevant data through 3 RAG queries (4,500 input + 600 output tokens per query), and generating the final report (2,000 input + 3,000 output tokens). Total per report: 16,300 input + 4,850 output tokens. At frontier pricing, each report costs approximately €0.45. With 2,000 reports per day, that is €900/day or €27,000/month. The optimization: route the three RAG queries to a mid-tier model (sufficient quality at 10% of the cost), cache the system prompt prefix (saving 60% on repeated input tokens), and implement response length limits for the report generation. Optimized cost: €0.09 per report, €180/day — an 80% reduction that makes the feature profitable at their €29/month subscription price, requiring only 7 reports per customer per month to justify the AI cost.