Course contentsShow
AI Engineering
Lesson 1034 of 1,88625. Model Serving and Inference OptimizationPro lesson

Grouped-Query Attention (GQA)

GQA as a middle ground between multi-head and multi-query attention for memory-quality balance.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.