This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Survey major agent benchmarks like WebShop, ALFWorld, SWE-bench, and their evaluation protocols.
You've completed the free preview. Subscribe to unlock every lesson in every course.