This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
When alignment training reduces model performance on tasks outside the reward model's distribution.
You've completed the free preview. Subscribe to unlock every lesson in every course.