Course contentsShow
Machine Learning and Deep Learning
Lesson 1177 of 3,53826. Pretrained Language Models: BERT FamilyPro lesson

Learning Rate and Layer-Wise Decay

Apply discriminative learning rates with lower rates for early layers and higher for task-specific heads.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.