
What is Beam Search?
Beam search is a text generation strategy that explores multiple possible output sequences simultaneously, keeping the top-k most promising candidates ("beams") at each step. Unlike greedy decoding (which always picks the most likely next token), beam search considers that the globally best sequence might not start with the locally most likely token.
Why It Matters
The choice of generation strategy significantly affects output quality. Greedy decoding can miss better sequences, pure sampling can produce incoherent text, and beam search offers a deterministic middle ground that finds high-probability sequences. Understanding generation strategies explains why the same model produces different outputs depending on settings.
How It Works
Generation strategies compared:
1. Greedy decoding:
- Always select the highest-probability token at each step
- Fast but can miss globally optimal sequences
- Example: "The" β "cat" β "is" might be better than "The" β "big" β "cat" β "is" but greedy picks "big" if it's locally most likely
2. Beam search:
- Maintain k parallel hypotheses (beams), typically k=4-8
- At each step, expand each beam with all possible next tokens
- Keep only the top-k scoring sequences (by total log-probability)
- Continue until all beams produce an end token