This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Trading compute for memory by recomputing attention activations during backward pass instead of storing them.
You've completed the free preview. Subscribe to unlock every lesson in every course.