Course contentsShow
Machine Learning and Deep Learning
Lesson 2300 of 3,53850. Deep Reinforcement Learning: Advanced Policy MethodsPro lesson

TRPO Performance Characteristics

Sample efficiency, monotonic improvement guarantees, and when TRPO excels compared to vanilla policy gradients.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.