
What is GPT?
GPT (Generative Pre-trained Transformer) is a family of large language models developed by OpenAI that use the transformer architecture to generate human-like text. The GPT lineage — from GPT-1 (2018) through GPT-4 (2023) — established the paradigm of scaling transformer models for general-purpose language capabilities.
Why It Matters
GPT popularized the approach that defines modern AI: take a transformer, pre-train it on massive text data, scale it up, and align it with human preferences. The successive GPT models demonstrated that scaling — more data, more parameters, more compute — consistently produces more capable models. ChatGPT, built on GPT-3.5 and GPT-4, brought generative AI to the mainstream.
How It Works
All GPT models share core characteristics:
- Architecture — decoder-only transformer. The model processes text left-to-right, using self-attention to understand context.
- Pre-training objective — next-token prediction (autoregressive language modeling). Given preceding tokens, predict the next one.
- Scaling — each generation dramatically increased model size:
- GPT-1 (2018): 117M parameters
- GPT-2 (2019): 1.5B parameters
- GPT-3 (2020): 175B parameters
- GPT-4 (2023): rumored mixture-of-experts with 1T+ parameters
- Alignment — post-training with RLHF to make the model helpful and safe.
The "pre-trained" in GPT refers to the model learning general language capabilities before being adapted (via prompting or fine-tuning) for specific tasks.
Example
ChatGPT is the conversational interface to GPT models. When you ask ChatGPT to explain quantum physics, GPT-4 uses its pre-trained knowledge of language and physics, combined with its alignment training, to generate an accurate and accessible explanation.