Natural language processing, robotics, video processing, stock market forecasting and other similar tasks require models that can deal with sequence data and understand temporal dependencies. Two major classes of models that have been designed to deal with sequence data are recurrent neural networks (RNNs/LSTMs) and transformer architectures. Designing and understanding these models is a very active and diverse area of research. Applications of these models are also widespread. The recent explosion of interest in topics such as language modelling and machine translation is based on advances in these models which includes GPT-3, DALL-E, etc.

This course will focus on recurrent neural networks and transformers that led to these breakthroughs. You'll first understand the fundamentals of these models, and then we’ll go through and discuss both seminal and recent research papers on these topics to throw light on algorithms and challenges in this field.
Semester: WT 2024/25