Course contentsShow
AI Engineering
Lesson 1033 of 1,88625. Model Serving and Inference OptimizationPro lesson

Multi-Query Attention (MQA)

How MQA reduces KV cache memory by sharing key-value pairs across attention heads.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.