This lesson is for subscribers
You've completed the free preview. Subscribe to unlock every lesson in every course.
Causal masking prevents decoder from attending to future positions during autoregressive generation.
You've completed the free preview. Subscribe to unlock every lesson in every course.