Course contentsShow
Machine Learning and Deep Learning
Lesson 2190 of 3,53847. Reinforcement Learning: Temporal Difference MethodsPro lesson

UCB Formula and Confidence Intervals

Deriving the UCB formula with logarithmic bonus terms that balance value estimates and uncertainty.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.