This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Why PPO performs multiple gradient steps on the same batch of experience, unlike on-policy methods.
You've completed the free preview. Subscribe to unlock every lesson in every course.