What is GPT?

GPT (Generative Pre-trained Transformer) is a family of large language models developed by OpenAI that use the transformer architecture to generate human-like text. The GPT lineage — from GPT-1 (2018) through GPT-4 (2023) — established the paradigm of scaling transformer models for general-purpose language capabilities.

Why It Matters

GPT popularized the approach that defines modern AI: take a transformer, pre-train it on massive text data, scale it up, and align it with human preferences. The successive GPT models demonstrated that scaling — more data, more parameters, more compute — consistently produces more capable models. ChatGPT, built on GPT-3.5 and GPT-4, brought generative AI to the mainstream.

How It Works

All GPT models share core characteristics:

Architecture — decoder-only transformer. The model processes text left-to-right, using self-attention to understand context.
Pre-training objective — next-token prediction (autoregressive language modeling). Given preceding tokens, predict the next one.
Scaling — each generation dramatically increased model size:

GPT-1 (2018): 117M parameters
GPT-2 (2019): 1.5B parameters
GPT-3 (2020): 175B parameters
GPT-4 (2023): rumored mixture-of-experts with 1T+ parameters

Alignment — post-training with RLHF to make the model helpful and safe.

The "pre-trained" in GPT refers to the model learning general language capabilities before being adapted (via prompting or fine-tuning) for specific tasks.

Example

ChatGPT is the conversational interface to GPT models. When you ask ChatGPT to explain quantum physics, GPT-4 uses its pre-trained knowledge of language and physics, combined with its alignment training, to generate an accurate and accessible explanation.

What is GPT?

Why It Matters

How It Works

All GPT models share core characteristics:

Architecture — decoder-only transformer. The model processes text left-to-right, using self-attention to understand context.
Pre-training objective — next-token prediction (autoregressive language modeling). Given preceding tokens, predict the next one.
Scaling — each generation dramatically increased model size:

GPT-1 (2018): 117M parameters
GPT-2 (2019): 1.5B parameters
GPT-3 (2020): 175B parameters
GPT-4 (2023): rumored mixture-of-experts with 1T+ parameters

Alignment — post-training with RLHF to make the model helpful and safe.

The "pre-trained" in GPT refers to the model learning general language capabilities before being adapted (via prompting or fine-tuning) for specific tasks.

What is GPT?

What is GPT?

Why It Matters

How It Works

Example

Sources

What is GPT?

What is GPT?

Why It Matters

How It Works

Example

Sources