
What is Pre-training?
Pre-training is the initial phase of training an AI model on a large, general-purpose dataset before it is adapted for specific tasks. During pre-training, the model learns broad patterns β language structure, visual concepts, or domain knowledge β that serve as a foundation for later specialization.
Why It Matters
Pre-training is what makes foundation models possible. By investing massive compute to train on broad data once, the resulting model can be cheaply adapted to thousands of different tasks. Without pre-training, every new application would require training from scratch β an expensive and data-hungry process.
How It Works
For large language models, pre-training typically uses self-supervised learning:
- Data β trillions of tokens from web pages, books, code, and other text sources.
- Objective β predict the next token given preceding context (autoregressive, as in GPT) or predict masked tokens (masked language modeling, as in BERT).
- Scale β training runs for weeks or months across thousands of GPUs, costing millions of dollars for frontier models.
- Output β a general-purpose model with broad knowledge and capabilities, but not yet aligned to be helpful or safe.
After pre-training, models typically undergo:
- Supervised fine-tuning (SFT) β training on curated instruction-response pairs