Course contentsShow
Machine Learning and Deep Learning
Lesson 2254 of 3,53849. Deep Reinforcement Learning: Policy-BasedPro lesson

Episode-Based Gradient Estimation

Collecting full trajectories and using Monte Carlo returns to estimate policy gradients.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.