Course contentsShow
Machine Learning and Deep Learning
Lesson 1796 of 3,53838. Instruction Tuning and AlignmentPro lesson

Rollout Generation and Experience Collection

Sampling responses from the current policy and computing rewards for each generation trajectory.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.