Course contentsShow
Machine Learning and Deep Learning
Lesson 1797 of 3,53838. Instruction Tuning and AlignmentPro lesson

Mini-Batch Updates and Multiple Epochs

Reusing collected rollouts across multiple gradient steps while respecting trust region constraints.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.