Course contentsShow
Machine Learning and Deep Learning
Lesson 1766 of 3,53838. Instruction Tuning and AlignmentPro lesson

The Role of the SFT Model in RLHF

Why supervised fine-tuning on high-quality demonstrations precedes reward modeling and provides initialization.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.