Course contentsShow
Machine Learning and Deep Learning
Lesson 1150 of 3,53826. Pretrained Language Models: BERT FamilyPro lesson

Masking Strategy: 80-10-10 Rule

Why BERT masks tokens with 80% [MASK], 10% random tokens, and 10% unchanged tokens.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.