Course contentsShow
Machine Learning and Deep Learning
Lesson 1802 of 3,53838. Instruction Tuning and AlignmentPro lesson

PPO Alternatives and Recent Improvements

Exploring variants like GRPO, rejection sampling, and online DPO that simplify the RLHF pipeline.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.