CS626: Speech, Natural Language Processing and the Web

Autumn, 2024

Announcements

Join the MS Teams using the code (Use IITB ID): 74m0vuk

Previous iterations of the course: 2023 | 2022 | 2021 |


Course Details:

CS626: Speech, Natural Language Processing and the Web (Autumn, 2024)

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

Time Table and Venue:

  • Slot: 12
  • Venue: LA001
  • Monday: 17:30 to 19:00
  • Thursday: 17:30 to 19:00

Course Description:

The general approach in the course will be covering the following:

  1. A language phenomenon.
  2. The corresponding language processing task.
  3. Techniques based on deep learning, classical machine learning, and knowledge base.

On one hand we will understand the language processing task in detail using linguistics, cognitive science, utility etc., on the other hand we will delve deep into techniques for solving the problem. In keeping with current trends, the lectures will make heavy reference to Large Language Models. The topics are given below -


  • Sound: Biology of Speech Processing; Place and Manner of Articulation; Peculiarities of Vowels and Consonants; Word Boundary Detection; Argmax based computations; Hidden Markov Model and Speech Recognition; deep neural nets for speech processing.
  • Morphology: Morphology fundamentals; Isolating, Inflectional, Agglutinative morphology; Infix, Prefix and Postfix Morphemes, Morphological Diversity of Indian Languages; Morphology Paradigms; Rule Based Morphological Analysis: Finite State Machine Based Morphology; Automatic Morphology Learning; Deep Learning based morphology analysis.
  • Shallow Parsing: Part of Speech (POS) Tagging; HMM based POS tagging; Maximum Entropy Models and POS; Random Fields and POS; DNN for POS.
  • Parsing: Constituency and Dependency Parsing; Theories of Parsing; Scope Ambiguity and Attachment Ambiguity Resolution; Rule Based Parsing Algorithms; Probabilistic Parsing; Neural Parsing.
  • Meaning: Lexical Knowledge Networks, Wordnet Theory and Indian Language Wordnets; Semantic Roles; Word Sense Disambiguation; Metaphors.
  • Discourse and Pragmatics: Coreference Resolution; Cohesion and Coherence.
  • Applications: Machine Translation; Sentiment and Emotion Analysis; Text Entailment; Question Answering; Code Mixing; Analytics and Social Networks, Information Retrieval and Cross Lingual Information Retrieval (IR and CLIR).

References:

  • Pushpak Bhattacharyya and Aditya Madhav Joshi, Natural Language Processing, Print ISBN: 978-93-5746-283-9 eISBN: 978-93-5746-239-6, Wiley India, 2023.
  • Allen, James, Natural Language Understanding, Second Edition, Benjamin/Cumming, 1995.
  • Charniack, Eugene, Statistical Language Learning, MIT Press, 1993.
  • Jurafsky, Dan and Martin, James, Speech and Language Processing, Speech and Language Processing (3rd ed. draft), Draft chapters in progress, October 16, 2019.
  • Manning, Christopher and Heinrich, Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
  • Jacob Eisenstein, Introduction to Natural Language Processing, MIT Press, 2019.
  • Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, MIT Press, 2016.
  • Radford, Andrew et. al., Linguistics, an Introduction, Cambridge University Press, 1999.
  • Pushpak Bhattacharyya, Machine Translation, CRC Press, 2017.
  • Journals: Computational Linguistics, Natural Language Engineering, Machine Learning, Machine Translation, Artificial Intelligence.
  • Conferences: Annual Meeting of the Association of Computational Linguistics (ACL), Computational Linguistics (COLING), European ACL (EACL), Empirical Methods in NLP (EMNLP), Annual Meeting of the Special Interest Group in Information Retrieval (SIGIR), Human Language Technology (HLT).

Pre-requisites

  • Data Structures and Algorithms
  • Python (or similar language) Programming skill

Course Instructor

Prof. Pushpak Bhattacharyya
Prof. Pushpak Bhattacharyya


Office Hours Schedule

Professor Pushpak Bhattacharyya: Tuesday, 15:00 - 16:00, New CSE 510

TAs Office Hours:

  • Tousin & Harsh: Monday, 14:00 - 15:00, CFILT
  • Manishit & Hemanth: Friday, 15:45 - 16:45, CFILT
  • Anshul & Nihar: Friday, 13:00 - 14:00, CFILT
  • Kishan & Yash: Thursday, 14:00 - 15:00, CFILT
  • Sravanthi & Abishek: Thursday, 15:00 - 16:00, CFILT
  • Spandan & Shrey: Friday, 11:30 - 12:30, CFILT
  • Sneha & Aditya: Wednesday, 16:00 - 17:00, CFILT
  • Aashay & Anuj: Friday, 14:30 - 15:30, CFILT
  • Satyam & Arnav: Friday, 14:30 - 15:30, CFILT
CFILT LAB: Room Number 401, 4th Floor, New CC Building

Course Materials

Lecture Topics Slide links Video Links
Week 1
(Week of 29th July)
  • Introduction & Course Logistics
  • Language Modelling
Week 1 Lecture 1
Lecture 2
Week 2
(Week of 5th Aug)
  • POS Tagging (PART-1)
  • POS Tagging (PART-2) | Hidden Markov Model
Week 2 Lecture 3
Lecture 4
Week 3
(Week of 12th Aug)
  • POS Tagging (PART-3) | Viterbi Algorithm
Week 3 Lecture 5
Week 4
(Week of 19th Aug)
  • Discriminative POS Tagging
  • POS Tagging using Conditional Random Fields (CRF)
Week 4 Lecture 6
Lecture 7
Week 5
(Week of 26th Aug)
  • Parsing | Constituency & Dependency Parsing
Week 5 Lecture 8
Week 6
(Week of 2nd Sept)
  • Constituency & Dependency Parsing | CYK Algo
  • Grammar and Parsing Algorithms | Probabilistic Parsing
Week 6 Lecture 9
Lecture 10
Week 7
(Week of 9th Sept)
  • Probabilistic Parsing Cont'd | Expectation Maximization
  • Probabilistic Parsing Cont'd | Inside-Outside Algorithm
Week 7 Lecture 11
Lecture 12
Week 8
(Week of 23rd Sept)
  • Introduction to Machine Translation by Sourabh Deoghare
  • Machine Translation and Evaluation
Week 8 Lecture 13
Lecture 14
Week 9
(Week of 30th Sept)
  • Machine Translation - Language Divergence and Evaluation
  • Machine Translation - Evaluation
Week 9 Lecture 15
Lecture 16
Week 10
(Week of 7th Oct)
  • Bias Detection, Mitigation & Metrics by Nihar Ranjan Sahoo
  • Bias & Hypothesis Testing
Week 10 Lecture 17
Lecture 18
Week 11
(Week of 14th Oct)
  • Hypothesis Testing
  • Hypothesis Testing (PART 2)
Week 11 Lecture 19
Lecture 20
Week 12
(Week of 21st Oct)
  • PROMPT, Central Limit Theorem
Week 12 Lecture 21
Week 13
(Week of 28th Oct)
  • Parsing Cont'd | Dependency Tree | Projectivity
  • Universal Dependency Tree | Named-Entity Recognition (NER)
  • Named-Entity Recognition (NER) | Lexical Knowledge Network
  • Wordnet | Course Recap
Week 13 Lecture 22
Lecture 23
Lecture 24
Lecture 25

Contact Us

  • CFILT Lab
  • Room Number: 401, 4th Floor, new CC building
  • Department of Computer Science and Engineering
  • Indian Institute of Technology Bombay
  • Mumbai 400076, India