CS772: Deep Learning for Natural Language Processing

Autumn, 2025-2026

Announcements

Join MS Teams with code: be1n1vl (Login to MS Teams using IITB LDAP)

Access course lecture recordings on YouTube channel YouTube Subscribe

Previous iterations of the course: 2024 | 2023 | 2022


Course Details

CS772: Deep Learning for Natural Language Processing (Autumn, 2025)

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

Time Table and Venue

  • Slot: 11
  • Venue: CC103 (First 4 Classes at FC Kohli)
  • Tuesday: 3:30 PM to 5 PM
  • Friday: 3:30 PM to 5 PM


Motivation

Deep Learning (DL) is a framework for solving AI problems based on a network of neurons organized in many layers. DL has found heavy use in Natural Language Processing (NLP) too, including problems like machine translation, sentiment and emotion analysis, question answering, information extraction, and so on, improving performance on automatic systems by orders of magnitude.

The course CS626 (Speech, NLP, and the Web) being taught in the the CSE Department at IIT Bombay for the last several years creates a strong foundation in NLP, covering the whole NLP stack, starting from morphology to part of speech tagging, to parsing and discourse and pragmatics. Students of this course acquire a strong grip on tasks, techniques, and linguistics of a plethora of NLP problems. Past lectures of CS626 can be found here YouTube

CS772 (Deep Learning for Natural Language Processing) comes as a natural sequel to CS626. Language tasks are examined through the lens of Deep Learning. Foundations and advancements in Deep Learning are taught, integrated with NLP problems. For example, sequence-to-sequence transformers are covered with applications in machine translation. Similarly, various techniques in word embedding are taught with applications to text classification, information extraction, etc. Recent breakthroughs such as those involving Large Language Models (LLMs) and Generative AI (GenAI) are also highlighted as natural extensions of these core concepts. Throughout the course, references and allusions will be made to LLMs and GenAI.


Course Description

  • Background: History of Neural Nets; History of NLP; Basic Mathematical Machinery - Linear Algebra, Probability, Information Theory, etc.; Basic Linguistic Machinery - Phonology, morphology, syntax, semantics.
  • Introducing Neural Computation: Perceptrons, Feedforward Neural Network and Backpropagation, Recurrent Neural Nets.
  • Difference between Classical Machine Learning and Deep Learning: Representation - Symbolic Representation, Distributed Representation, Compositionality; Parametric and non-parametric learning.
  • Word Embeddings: Word2Vec (CBOW and Skip Gram), Glove, FastText.
  • Application of Word Embedding to Shallow Parsing: Morphological Processing, Part of Speech Tagging and Chunking.
  • Sequence to Sequence (seq2seq) Transformation using Deep Learning: LSTMs and Variants, Attention, Transformers.
  • Transformer: Architecture and Functioning, Attention, Positional Encoding, Layer Normalization, Applications in Sentiment Analysis, Image Labelling etc.
  • Deep Neural Net based Language Modeling: XLM, BERT, GPT2-3 etc.; Subword Modeling; Transfer Learning and Multilingual Modeling.
  • Small LM, Medium LM and Large LM: Language Modelling, Statistical Language Modelling; Curse of Dimensionality, Neural Language Modelling, Various LMs like BERT, XLM, etc., Open LLMs like Llama, Mistral, Mixtral.
  • Pretraining and Finetuning: RAG, Generative Pretraining and GPT series; Transfer Learning and 0-shot Learning, Finetuning and few-shot learning; domain and language adaptation, Knowledge Graph Embedding; Knowledge Infusion; Retrieval Augmented Generation Natural Language Generation.
  • Application of seq2seq in Machine Translation: Supervised, Semi-supervised, and Unsupervised MT; Encoder-Decoder and Attention in MT; Memory Networks in MT.
  • Deep Learning and Deep Parsing: Recursive Neural Nets; Neural Constituency Parsing; Neural Dependency Parsing.
  • Deep Learning and Deep Semantics: Word Embeddings and Word Sense Disambiguation; Semantic Role Labeling with Neural Nets.
  • Neural Text Classification: Sentiment and Emotion labeling with Deep Neural Nets (DNN); DNN-based Question Answering.
  • The indispensability of DNN in Multimodal NLP: Advanced Problems like Sarcasm, Metaphor, Humor, and Fake News Detection using multimodality and DNN.
  • Natural Language Generation: Extractive and Abstractive Summarization with Neural Nets.
  • Large Language Models and Generative AI: Llama3 like systems, ChatGPT, Gemini, Copilot, Comet etc.
  • Good and Bad LLMs: Bias, Hallucination, FakeNews detection and mitigation.

References

  • Pushpak Bhattacharyya and Aditya Madhav Joshi, Natural Language Processing, Print ISBN: 978-93-5746-283-9 eISBN: 978-93-5746-239-6, Wiley, 2023.
  • Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
  • Dan Jurafsky and James Martin, Speech and Language Processing, 3rd Edition, October 16, 2019.Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola, Dive into Deep Learning, e-book, 2020.
  • Christopher Manning and Heinrich Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
  • Daniel Graupe, Deep Learning Neural Networks: Design and Case Studies, World Scientific Publishing Co., Inc., 2016.
  • Pushpak Bhattacharyya, Machine Translation, CRC Press, 2017.
  • Journals: Computational Linguistics, Natural Language Engineering, Journal of Machine Learning Research (JMLR), Neural Computation, IEEE Transactions on Neural Networks and Learning Systems.
  • Conferences: Annual Meeting of the Association of Computational Linguistics (ACL), Neural Information Processing (NeuiPS), Int’l Conf on Machine Learning (ICML), Empirical Methods in NLP (EMNLP).

Pre-requisites

  • Python (or similar language) Programming skill
  • Basics of Statistics and Probability
  • Basics of Linear Algebra
  • Basics of Machine Learning

Course Instructor

Prof. Pushpak Bhattacharyya
Prof. Pushpak Bhattacharyya


Course Materials

Lecture Topics Slide Links Video Links
Week 1
(Week of 28th July)
  • Introduction & Course Logistics
  • The Evolution of NLP: From Specialists to Giants by
    Prof. Mitesh M. Khapra, IIT Madras
Week 1 (T)
Week 1 (F)
Lecture 1
Lecture 2
Week 2
(Week of 4th Aug)
  • Programming Test
  • Whose English is it anyway?: Enhancing language models for text classification tasks by
    Dr. Aditya Joshi, UNSW
-----
Week 2 (F)
-----
Lecture 3
Week 3
(Week of 11th Aug)
  • Perceptron, Sigmoid, Softmax, POS
  • National Holiday: Independence Day
Week 3 (T)
-----
Lecture 4
-----
Week 4
(Week of 18th Aug)
  • Backpropagation
  • Enough of Scaling Laws! Let's focus on downscaling by
    Prof. Tanmoy Chakraborty, IIT Delhi
Week 4 (T)
Week 4 (F)
Lecture 5
Lecture 6
Week 5
(Week of 25th Aug)
  • Word Embeddings
  • Advances in Neural Information Retrieval by
    Dr. Rudra Murthy, IBM
Week 5 (T)
Week 5 (F)
Lecture 7
Lecture 8
Week 6
(Week of 1st Sept)
  • Part 1: Language Modeling and Seq-to-Seq Modeling by
    Dr. Anoop Kunchukuttan, Microsoft AI Core
  • Summarization by
    Mr. Srikanth Tamilselvam, IBM
Week 6 (T)

Week 6 (F)
Lecture 9

Lecture 10
Week 7
(Week of 8th Sept)
  • Part 2: The Transformer Model by
    Dr. Anoop Kunchukuttan, Microsoft AI Core
  • Quiz - 1
Week 7 (T)

Lecture 11

Week 8
(Week of 15th Sept)
  • Midsem Week
----

----

Week 9
(Week of 22nd Sept)
  • Convolutional Neural Networks by
    Mr. Jimut Bahan Pal, IIT Bombay
  • Turing Test 2.0: The Possibility, Usefulness and Challenges of Imitating a Specific User through Generative AI by
    Prof. Monojit Choudhury, MBZUAI, Abu Dhabi
Week 9 (T)
Code accompanying the slides
Code for the tutorial

Week 9 (F)
Lecture 12


Lecture 13
Week 10
(Week of 29nd Sept)
  • Responsible Language Models: Navigating Bias, Stereotypes, and Vulnerabilities by
    Nihar Ranjan Sahoo and Narjis Asad, IIT Bombay
  • Federated Communication-Efficient Multi-Objective Optimization by
    Prof. Pranay Sharma, C-MInDS, IIT Bombay
Week 10 (T)

Week 10 (F)
Lecture 14

Lecture 15
Week 11
(Week of 6th Oct)
  • No Class
  • Misinformation and Transformers as Languages Models by
    Prof. Raksha Sharma, IIT Roorkee
----

Week 11 (F)
----

Lecture 16
Week 12
(Week of 13th Oct)
  • Diffusion Models by
    Tejomay Kishor Padole, IIT Bombay
  • Enabling Robots to Understand Language by
    Prof. Rohan Paul, IIT Delhi
Week 12 (T)

Week 12 (F)
Lecture 17

Lecture 18
Week 13
(Week of 20th Oct)
  • Financial information extraction using knowledge graphs by
    Saiful Haq, Hyperbots Inc. and IIT Bombay
  • Vision language models by
    Settaluri Lakshmi Sravanthi, IIT Bombay
Week 13 (T)

Week 13 (F)
Lecture 19

Lecture 20
Week 14
(Week of 27th Oct)
  • Retrieval-Augmented Generation (RAG) Part 1 by
    Prof. Soumen Chakrabarti, IIT Bombay
  • etrieval-Augmented Generation (RAG) Part 2 by
    Prof. Soumen Chakrabarti, IIT Bombay
Week 14 (T) and (F)

Lecture 21

Lecture 22
Week 15
(Week of 3rd Nov)
  • In-Context Learning by
    Prof. Sunita Sarawagi, IIT Bombay
  • Guest Lecture by
    Dr. Partha Talukdar, Principle Research Scientist and Director, Google Deepmind
Week 15 (T)

Week 15 (F)
Lecture 23

Lecture 24

Contact Us

  • CFILT Lab
  • Room Number: 401, 4th Floor, new CC building
  • Department of Computer Science and Engineering
  • Indian Institute of Technology Bombay
  • Mumbai 400076, India