Course contentsShow
Machine Learning and Deep Learning
Lesson 2734 of 3,53859. Distributed Training: Data ParallelismPro lesson

FSDP Backward Pass and Gradient Sharding

How gradients are computed locally, reduced across ranks, and only the local shard is retained.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.