
What is an Encoder-Decoder Architecture?
An encoder-decoder architecture is a neural network design with two distinct components: an encoder that reads and compresses input into an internal representation, and a decoder that uses that representation to produce output. The transformer family includes three variants: encoder-only, decoder-only, and full encoder-decoder models.
Why It Matters
Understanding encoder-decoder architectures explains why different AI models excel at different tasks. BERT (encoder-only) is great for understanding and classification. GPT (decoder-only) excels at text generation. T5 (encoder-decoder) handles translation and summarization. Knowing the architecture helps you choose the right model for a given task.
How It Works
The three transformer variants:
1. Encoder-only (e.g., BERT, RoBERTa):
- Processes the full input bidirectionally (sees all tokens at once)
- Produces rich contextual representations of the input
- Best for: classification, named entity recognition, semantic similarity
- Not good for: generating new text
2. Decoder-only (e.g., GPT, Claude, LLaMA):
- Processes tokens left-to-right (autoregressive)
- Each token can only attend to previous tokens (causal attention)
- Best for: text generation, chat, code completion
- The dominant architecture for modern LLMs
3. Encoder-decoder (e.g., T5, BART, mBART):
- Encoder reads the full input bidirectionally
- Decoder generates output autoregressively, attending to both previous output tokens and the encoder's representation
- Best for: translation, summarization, question answering with structured input
- Cross-attention connects encoder output to the decoder
The original transformer paper ("Attention Is All You Need") described the full encoder-decoder model for translation. The community then discovered that each half was powerful on its own.
Example
Google Translate uses an encoder-decoder model: the encoder reads the English sentence "I love AI" and creates an internal meaning representation. The decoder then generates the Dutch translation "Ik hou van AI" from that representation, one token at a time.