This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
How reward models can be exploited by policies that optimize the proxy objective rather than true human preferences.
You've completed the free preview. Subscribe to unlock every lesson in every course.