Course contentsShow
Machine Learning and Deep Learning
Lesson 1375 of 3,53831. Multimodal ModelsPro lesson

Early Vision-Language Models: Visual Question Answering

How early VQA models combined CNNs and RNNs to answer questions about images before unified pretraining.

This lesson is for subscribers

You've completed the free preview. Subscribe to unlock every lesson in every course.