Course contentsShow
Machine Learning and Deep Learning
Lesson 1681 of 3,53836. LLM Inference OptimizationPro lesson

Flash Attention Algorithm Overview

Core idea of Flash Attention: tiling and kernel fusion to compute exact attention without materializing the full matrix.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.