This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Evaluating when models inappropriately comply with harmful requests or fail to refuse problematic instructions.
You've completed the free preview. Subscribe to unlock every lesson in every course.