Course contentsShow
Machine Learning and Deep Learning
Lesson 1811 of 3,53838. Instruction Tuning and AlignmentPro lesson

DPO Hyperparameters: Beta and Learning Rate

How beta controls deviation from reference model and typical learning rate schedules.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.