Course contentsShow
Machine Learning and Deep Learning
Lesson 2198 of 3,53847. Reinforcement Learning: Temporal Difference MethodsPro lesson

Action-Value Functions in Bandits

Defining Q(a) as expected reward for action a and why we estimate rather than know these values.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.