
What is a Loss Function?
A loss function (also called a cost function or objective function) is a mathematical function that measures how far a model's predictions are from the actual target values. It provides the error signal that training algorithms use to improve the model.
Why It Matters
The loss function defines what "learning" means for a model. It's the quantity that training minimizes. Choosing the right loss function is crucial: it shapes what the model optimizes for and directly affects model behavior. For LLMs, the cross-entropy loss on next-token prediction is what drives the model to learn language.
How It Works
A loss function takes two inputs:
- Prediction β what the model output
- Target β what the correct answer is
It returns a single number (the loss) representing how wrong the prediction is. Training algorithms (gradient descent + backpropagation) then adjust model weights to minimize this number.
Common loss functions:
- Cross-entropy loss β standard for classification and language modeling. Measures the difference between predicted probability distribution and actual distribution.
- Mean Squared Error (MSE) β standard for regression. Averages the squared differences between predictions and targets.
- Binary cross-entropy β for binary classification (yes/no, spam/not spam).