Course contentsShow
Machine Learning and Deep Learning
Lesson 1795 of 3,53838. Instruction Tuning and AlignmentPro lesson

Value Function Learning in RLHF

Training a critic network to estimate expected rewards for variance reduction in policy gradients.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.