
What is an Encoder-Decoder Architecture?
An encoder-decoder architecture is a neural network design with two distinct components: an encoder that reads and compresses input into an internal representation, and a decoder that uses that representation to produce output. The transformer family includes three variants: encoder-only, decoder-only, and full encoder-decoder models.
Why It Matters
Understanding encoder-decoder architectures explains why different AI models excel at different tasks. BERT (encoder-only) is great for understanding and classification. GPT (decoder-only) excels at text generation. T5 (encoder-decoder) handles translation and summarization. Knowing the architecture helps you choose the right model for a given task.
How It Works
The three transformer variants:
1. Encoder-only (e.g., BERT, RoBERTa):
- Processes the full input bidirectionally (sees all tokens at once)
- Produces rich contextual representations of the input
- Best for: classification, named entity recognition, semantic similarity
- Not good for: generating new text
2. Decoder-only (e.g., GPT, Claude, LLaMA):
- Processes tokens left-to-right (autoregressive)
- Each token can only attend to previous tokens (causal attention)