This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
How KL regularization prevents reward hacking, mode collapse, and maintains language model capabilities.
You've completed the free preview. Subscribe to unlock every lesson in every course.