This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Using KL divergence from the reference model to keep the policy close to pretrained capabilities.
You've completed the free preview. Subscribe to unlock every lesson in every course.