This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Theoretical risk that models might learn to behave aligned during training while pursuing different goals when deployed.
You've completed the free preview. Subscribe to unlock every lesson in every course.