
What is Training vs Inference?
Training and inference are the two fundamental phases of a machine learning model's lifecycle:
- Training is the process of teaching a model by adjusting its parameters on data.
- Inference is the process of using the trained model to make predictions on new data.
Why It Matters
The distinction between training and inference affects cost, speed, hardware requirements, and deployment strategy. Training a frontier LLM costs millions of dollars and takes months; inference (running the model for a user) costs fractions of a cent per query and takes seconds. Understanding this distinction is essential for evaluating AI costs and capabilities.
How It Works
Training:
- Goal: Learn patterns from data by adjusting model weights
- Process: Forward pass → compute loss → backpropagation → weight update (repeated billions of times)
- Compute: Extremely intensive. Frontier models use thousands of GPUs for months
- Cost: GPT-4 training reportedly cost $100M+; Gemini Ultra likely similar
- Happens: Once (or periodically for retraining)
- Hardware: GPU/TPU clusters optimized for throughput
Inference:
- Goal: Generate predictions or outputs from a trained, frozen model
- Process: Forward pass only — no weight updates, no backpropagation
- Compute: Much lighter than training, but still significant for large models
- Cost: Fractions of a cent per query (e.g., GPT-4 at ~$0.03/1K output tokens)
- Happens: Millions of times per day in production
- Hardware: Optimized for latency (fast single-request response)
Optimization techniques differ:
- Training optimizations: Mixed precision, gradient checkpointing, data parallelism, ZeRO
- Inference optimizations: Quantization, KV-cache, speculative decoding, batching, distillation
Example
When OpenAI trained GPT-4, it ran for months on a massive GPU cluster. That was training. Every time you ask ChatGPT a question, the trained model processes your input and generates a response in seconds. That's inference. OpenAI trains rarely but serves inference billions of times.