Course contentsShow
Machine Learning and Deep Learning
Lesson 659 of 3,53816. Activation Functions and Weight InitializationPro lesson

GELU: Gaussian Error Linear Units

The stochastic activation function used in transformers and modern architectures.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.