CS 626-sem1-2012: Natural Language Processing
Autumn 2012
Instructor:
Prof. Pushpak Bhattacharyya (pb[AT]cse.iitb.ac.in)
TAs:
Lecture Schedule and Venue
Lecture Notes
- Lecture 1-3, July 19,22,26: Introduction, POS tagging [PDF]
- Lecture 4-5, July 30,Aug 2: HMM, POS tagging [PDF]
- Lecture 6-9, Aug 6,8,9,13: Viterbi; forward and backward; Baum Welch; IL POS tags [PDF]
- Lecture 10-11, Aug 16,20: EM [PDF]
- Lecture 12-13, Aug 23: parsing [PDF]
- Lecture 14, Aug 30: parsing [PDF]
- Lecture 15, Sep 17: parsing-ambiguity-prob [PDF]
- Lecture 16, Sep 20: UNL [PDF]
- Lecture 17-18, Sep 24,27:prob-parsing-parser-comparison [PDF]
- Lecture 19 and 24, Oct 1,8,11:PCFG algos [PDF]
- Lecture 20,21, Oct 2,4: Astar extra lecture [PDF]
- Lecture 23, Oct 8: Binding Theory [PDF]
- Lecture 25-26, Oct 10,15,18: Wordnet WSD [PDF]
- Lecture 27, Oct 25: Wordnet relations and WSD [PDF]
- Lecture 28,29, Oct 28,29: Speech Phonetics Phonology [PDF]
- Lecture 30, Nov 1: Phonology Transliteration [PDF]
- Lecture 31,32, Nov 5,6: Expectation Maximization [PDF]
- Lecture 33, Nov 8,11: Transliteration [PDF]
- Lecture 34, Nov 11: Maximum likelihood estimation [PDF]
top
Assignments
- HMM based POS tagger
- Train the system on the corpus supplied
- Design a HMM based POS tagger implementing the Viterbi algorithm
- Perform 5-fold cross validation and report overall precision, recall and F1 score as well as tag wise precision, recall and F1 score
- Generate a confusion matrix (matrix where Aij denotes the number of times tag i is classified as tag j)
- Comparison of Language Models
- Prove that a language model based on POS tagged text is better than one developed from raw text
- Choose a suitable NLP application for comparing the models (eg: autocompletion)
- Comparison of Discrimininative vs Generative Models
- Design a POS tagger based on discriminative model, i.e. learn P(tag sequence | sentence) directly. Use bigram assumption.
- Compare the above model with the generative model based POS tagger designed earlier in terms of overall precision, tag-wise precision, recall and F1 score.
- Show the diffences in confusion matrix in the two cases.
- Parser projection
- You are provided with parallel corpora in English and various Indian languages
- Obtain the parse tree for the sentence in English using Stanford parser.
- Project the parse tree into the chosen Indian language with the help of the equivalent sentence in the Indian language.
- A* search based POS tagger
- The starter code for A* is given
- Design a A* search based POS tagger
- Perform 5-fold cross validation and report overall precision, recall and F1 score as well as tag wise precision, recall and F1 score
- Generate a confusion matrix (matrix where Aij denotes the number of times tag i is classified as tag j)
- Compare the above results with HMM-based POS tagger
top
Resources
- Link to Last Semester's webpage
- Link to Spring 2011 webpage
- Link to Autumn 2009 webpage
- Link to 2008 webpage
- Link to 2006 webpage
- Suggested Reading Material:
- Text Books:
- Allen, James, Natural Language Understanding, Second Edition, Benjamin/Cumming, 1995.
- Charniack, Eugene, Statistical Language Learning, MIT Press, 1993.
- Jurafsky, Dan and Martin, James, Speech and Language Processing, Second Edition, Prentice Hall, 2008.
- Manning, Christopher and Heinrich, Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
- Radford, Andrew et. al., Linguistics, An Introduction, Cambridge University Press, 1999.
- Journals : Computational Linguistics, Natural Language Engineering, Machine Learning, Machine Translation, Artificial Intelligence
- Conferences : Annual Meeting of the Association of Computational Linguistics (ACL), Computational Linguistics (COLING), European ACL (EACL), Empirical Methods in NLP (EMNLP), Annual Meeting of the Special Interest Group in Information Retrieval (SIGIR), Human Language Technology (HLT).
top
Student Seminars
- Gender Classification by Speech Analysis [PPT]
- Neurolinguistics [ODP]
- Question Answering System and Watson [PPT]
- Sentiment Analysis with Multi-Modality [ODP][PPT][PDF]
- Smoothing Techniques - A Primer [PPT]
- Document Summarization [PPT]
- Automatic Detection of Spamming and Phishing [PPT]
- Recognizing Textual Entailment [PDF][PPT]
- Semantic Web - Making the Web more readable for Machines [PDF]
- Neurolinguistics [PPT]
- Where the Computers and Arts meet [PPT][PDF]
top
Assignment Final Presentation
Groups
- Gr-3 Sanober, Soumyajit, Naveen [PPT]
- Gr-4 Raksha, Aditya, Anamay [PDF]
- Gr-5 Geetanjali, Sachin, Deepak [PPT]
- Gr-6 Mandar, Piyush, Abhirut [ZIP]
- Gr-7 Shubham, Kallol, Rahul [TAR]
- Gr-8 Akansha, Maunik, Hemant [PPT]
- Gr-9 Nikhil, Subhash, Jayaprakash [GZ]
- Gr-10 Kritika, Vinita, Rucha [TAR]
- Gr-11 Biplab, Amit, Ravi [ZIP]
top
Marks and Grades
- Will be uploaded as the course progresses
top