Course contentsShow
Machine Learning and Deep Learning
Lesson 1767 of 3,53838. Instruction Tuning and AlignmentPro lesson

Reward Model Architecture and Training Objective

Using the base model with a scalar head to predict preference scores and the pairwise ranking loss.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.