Gradient Descent Simulation 1.0.0
This model visualizes gradient descent optimization - the fundamental algorithm used to train neural networks and other machine learning models. Agents represent different optimization algorithms searching for the minimum of a loss landscape (the “error surface” that ML models try to minimize during training).
The model demonstrates how different optimizer types (SGD, Momentum with different parameters) behave on various loss landscapes, from simple bowls to the notoriously difficult Rosenbrock “banana valley” function. This helps build intuition about why certain optimization algorithms work better than others for different problem geometries.
HOW IT WORKS
Agents (Optimizers):
- Each agent represents an optimizer instance trying to find the global minimum (located at coordinates 0,0)
- Agents move according to gradient descent rules: they calculate the gradient (slope) of the loss function at their current position and move “downhill”
Three Optimizer Types:
1. SGD (Red) - Standard Stochastic Gradient Descent with no momentum (momentum = 0)
2. Momentum (Yellow) - Uses momentum = 0.9 to accelerate in consistent directions
3. Momentum 0.95 (Green) - Higher momentum (0.95) with 2x learning rate for faster convergence
Movement Rules:
1. Calculate gradient at current position (numerical derivative)
2. Apply gradient clipping to prevent extreme steps
3. Update velocity using momentum: new_velocity = momentum × old_velocity - learning_rate × gradient
4. Add optional stochastic noise (simulating mini-batch effects)
5. Move to new position
6. Check if converged (loss below threshold for 20+ steps)
Loss Landscapes:
The patch colors represent the loss function value (darker blue = lower loss). Four landscapes are available:
- Simple Bowl: Smooth quadratic function - easiest to optimize
- Ravine: Elongated valley (10x steeper in one direction) - tests handling of ill-conditioned problems
- Rosenbrock: The famous “banana valley” with a curved, narrow valley - very challenging
- Complex: Multiple local minima created by sinusoidal oscillations overlaid on a bowl
HOW TO USE IT
Buttons:
- setup - Initializes the model: creates the loss landscape, places optimizers randomly, marks the global minimum (red patch at center)
- go - Runs the simulation continuously until all optimizers converge
Sliders:
- num-optimizers (0-100) - Number of optimizer agents to create
- base-learning-rate (0.01-1) - Step size for gradient descent. Smaller = slower but more stable.
Choosers:
- optimizer-type - Select which optimizer to use:
- “SGD” - All agents use standard gradient descent (red)
- “Momentum” - All agents use 0.9 momentum (yellow)
- “Momentum (0.95)” - All agents use 0.95 momentum (green)
- “Mixed” - Random mix of all three types
landscape-type- Select the loss function:- “Simple Bowl” - Smooth quadratic (easiest)
- “Ravine” - Elongated valley (tests ill-conditioning)
- “Rosenbrock” - Curved banana valley (very hard)
- “Complex” - Multiple local minima (tests exploration)
Switches:
- show-trails - When ON, agents leave colored trails showing their optimization path
- add-noise - When ON, adds stochastic noise to gradients (simulates mini-batch learning)
Monitors:
- converged-count - Shows how many optimizers have converged to the minimum
RELATED MODELS
- Distill.pub’s “Momentum” visualization: https://distill.pub/2017/momentum/
- Sebastian Ruder’s optimization overview: https://ruder.io/optimizing-gradient-descent/
CREDITS AND REFERENCES
Model Created By: AZOUANI Ilyes for DSTI School of Engineering - ABM Module
Mathematical References:
- Rosenbrock, H.H. (1960). “An automatic method for finding the greatest or least value of a function”
- Cauchy, AL. (1847). “Méthode générale pour la résolution des systèmes d’équations simultanées”
- Polyak, B.T. (1964). “Some methods of speeding up the convergence of iteration methods”
Machine Learning Context:
- Ruder, S. (2016). “An overview of gradient descent optimization algorithms.” arXiv:1609.04747
License:
This model is provided for educational purposes. Feel free to modify and extend it for learning about optimization algorithms and machine learning concepts.
Version: 1.0
Date: 2025
Release Notes
One of one release.