
What are Overfitting and Underfitting?
Overfitting and underfitting are two fundamental failure modes in machine learning that describe how well a model generalizes from training data to new, unseen data.
- Overfitting: The model memorizes the training data too precisely, including its noise and quirks, and fails to generalize to new data.
- Underfitting: The model is too simple to capture the underlying patterns in the data, performing poorly on both training and new data.
Why It Matters
The balance between overfitting and underfitting — known as the bias-variance tradeoff — is central to building effective ML systems. An overfit model gives false confidence (great on training data, terrible in production). An underfit model is useless for everyone. Every ML practitioner must navigate this tradeoff.
How It Works
Overfitting happens when:
- The model is too complex for the amount of training data
- Training runs for too many epochs
- The model has too many parameters relative to the data
Signs: training loss is very low but validation loss is high.
Remedies: more training data, regularization (L1/L2, dropout), early stopping, data augmentation, reducing model complexity.
Underfitting happens when:
- The model is too simple for the data's complexity
- Training stops too early
- Important features are missing from the input
Signs: both training and validation loss are high.
Remedies: use a more complex model, add features, train longer, reduce regularization.
The ideal model sits in the "Goldilocks zone" — complex enough to capture real patterns, simple enough to ignore noise.
Example
A student who memorizes every exam question but can't answer a slightly reworded question is overfitting. A student who only studied the chapter titles and can't answer any detailed questions is underfitting. The best student understands the underlying concepts and can answer both familiar and novel questions.