Skip to main content
BVDNETBVDNET
ServicesWorkLibraryAboutPricingBlogContact
Contact
  1. Home
  2. AI Woordenboek
  3. Core Concepts
  4. What is Gradient Descent?
book-openCore Concepts
Intermediate
2026-W17

What is Gradient Descent?

Gradient descent is the optimization algorithm that trains neural networks by iteratively adjusting parameters in the direction that reduces prediction error.

Also known as:
SGD
stochastic gradient descent
gradiëntafdaling
AI Intel Pipeline
What is Gradient Descent?

What is Gradient Descent?

Gradient descent is the optimization algorithm used to train machine learning models. It iteratively adjusts model parameters in the direction that most reduces the loss function — like walking downhill in a landscape of errors to find the lowest point.

Why It Matters

Gradient descent is the workhorse of all neural network training. Every LLM, every vision model, every deep learning system was trained using some variant of gradient descent. Understanding it explains why training requires so much compute and why hyperparameters like learning rate matter.

How It Works

  1. Initialize weights randomly.
  2. Compute the loss — forward pass through the network, then calculate error.
  3. Compute gradients — use backpropagation to find the derivative of the loss with respect to each weight. The gradient points in the direction of steepest increase.
  4. Update weights — move each weight in the opposite direction of its gradient (downhill), scaled by the learning rate.
  5. Repeat until the loss converges.

Variants:

  • Batch gradient descent — compute gradients over the entire dataset. Precise but slow.
  • Stochastic Gradient Descent (SGD) — compute gradients on a single random example. Fast but noisy.
  • Mini-batch SGD — compute gradients on small batches (e.g., 32 or 64 examples). The practical standard.
  • Adam — adaptive learning rate optimizer that's the default for most deep learning. Combines momentum with per-parameter learning rate adjustment.

The learning rate is critical: too high and the model overshoots the minimum; too low and training takes forever or gets stuck in local minima.

Example

Imagine you're blindfolded on a hilly terrain trying to reach the lowest valley. You feel the slope under your feet (the gradient) and take a step downhill. Gradient descent does exactly this — but in a space with millions of dimensions (one per model parameter).

Sources

  1. 3Blue1Brown – Gradient Descent
  2. Google ML Crash Course – Gradient Descent

Need help implementing AI?

I can help you apply this concept to your business.

Get in touch

Related Concepts

Tokenizer
A tokenizer converts raw text into tokens — the discrete units a language model processes — using subword algorithms like BPE or SentencePiece.
Artificial Intelligence (AI)
Artificial intelligence is the field of computer science that builds systems capable of performing tasks normally requiring human intelligence, such as learning, reasoning, and perception.
Batch Size
Batch size (examples per update) and learning rate (step size for weight updates) are the two most important hyperparameters controlling how neural networks train.
Benchmark (AI Evaluation)
A benchmark is a standardized test used to measure and compare AI model performance, providing reproducible scores across tasks like reasoning, coding, and knowledge.

AI Consulting

Need help understanding or implementing this concept?

Talk to an expert
Previous

GPT

Next

GraphRAG

BVDNETBVDNET

Web development and AI automation. Done properly.

Company

  • About
  • Contact
  • FAQ

Resources

  • Services
  • Work
  • Library
  • Blog
  • Pricing

Connect

  • LinkedIn
  • Email

© 2026 BVDNET. All rights reserved.

Privacy Policy•Terms of Service•Cookie Policy