Course contentsShow
Machine Learning and Deep Learning
Lesson 2304 of 3,53850. Deep Reinforcement Learning: Advanced Policy MethodsPro lesson

The Clipping Mechanism in Detail

How clip(ratio, 1-ε, 1+ε) creates a pessimistic bound that prevents destructively large policy updates.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.