This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
How RLHF can lead to repetitive or formulaic outputs as models converge to high-reward but stereotyped responses.
You've completed the free preview. Subscribe to unlock every lesson in every course.