This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Sample efficiency, monotonic improvement guarantees, and when TRPO excels compared to vanilla policy gradients.
You've completed the free preview. Subscribe to unlock every lesson in every course.