
What is GPT?
GPT (Generative Pre-trained Transformer) is a family of large language models developed by OpenAI that use the transformer architecture to generate human-like text. The GPT lineage β from GPT-1 (2018) through GPT-4 (2023) β established the paradigm of scaling transformer models for general-purpose language capabilities.
Why It Matters
GPT popularized the approach that defines modern AI: take a transformer, pre-train it on massive text data, scale it up, and align it with human preferences. The successive GPT models demonstrated that scaling β more data, more parameters, more compute β consistently produces more capable models. ChatGPT, built on GPT-3.5 and GPT-4, brought generative AI to the mainstream.
How It Works
All GPT models share core characteristics:
- Architecture β decoder-only transformer. The model processes text left-to-right, using self-attention to understand context.
- Pre-training objective β next-token prediction (autoregressive language modeling). Given preceding tokens, predict the next one.
- Scaling β each generation dramatically increased model size:
- GPT-1 (2018): 117M parameters
- GPT-2 (2019): 1.5B parameters
- GPT-3 (2020): 175B parameters
- GPT-4 (2023): rumored mixture-of-experts with 1T+ parameters
- Alignment β post-training with RLHF to make the model helpful and safe.