Course contentsShow
Machine Learning and Deep Learning
Lesson 2781 of 3,53860. Distributed Training: Model Parallelism and Mixed PrecisionPro lesson

What is Gradient Accumulation and Why It's Needed

Understanding how gradient accumulation simulates larger batch sizes when GPU memory is limited by accumulating gradients over multiple mini-batches.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.