Course contentsShow
Machine Learning and Deep Learning
Lesson 1786 of 3,53838. Instruction Tuning and AlignmentPro lesson

Multi-Objective Reward Models

Modeling multiple attributes: helpfulness, harmlessness, honesty, and combining preferences into a single reward.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.