This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Combine policy evaluation and policy improvement in an alternating scheme to find optimal policies iteratively.
You've completed the free preview. Subscribe to unlock every lesson in every course.