This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Using Reinforcement Learning from Human Feedback to align model outputs with human preferences and safety guidelines.
You've completed the free preview. Subscribe to unlock every lesson in every course.