
What are Batch Size and Learning Rate?
Batch size is the number of training examples processed together before updating the model's weights. Learning rate is the step size used to update weights during training β how much the model adjusts its parameters in response to each batch's error signal. Together, they are the two most important hyperparameters controlling how a neural network learns.
Why It Matters
These two numbers can make or break model training. A learning rate too high causes the model to diverge (oscillating wildly); too low and it barely learns (or gets stuck). A batch size too small introduces noisy gradients; too large wastes compute and may reduce generalization. Understanding these trade-offs is essential for anyone training or fine-tuning AI models.
How It Works
Learning rate:
- Controls the magnitude of weight updates: new_weight = old_weight - learning_rate Γ gradient
- Typical range: 1e-5 to 1e-2 (0.00001 to 0.01)
- Too high β training diverges, loss explodes
- Too low β training is very slow, may get stuck in local minima
- Just right β smooth convergence to a good solution
Learning rate schedules:
- Warmup β start with a very low rate, gradually increase (prevents early instability)