CS 626-sem1-2012 Speech and Natural Language Processing and the Web

CS 626-sem1-2012: Natural Language Processing
Autumn 2012

Lecture Notes

Assignments

Resources

Student Seminars

NLP Assignment Presentations

Marks/Grades

Instructor: Prof. Pushpak Bhattacharyya (pb[AT]cse.iitb.ac.in)

TAs:

Bibek Behera(bibek[AT]cse.iitb.ac.in)
Avishek Dan(avishekdan[AT]cse.iitb.ac.in)

Lecture Schedule and Venue

Timings: Monday 2:00PM-3:30PM, Thursday 2:00PM-3:30PM
Venue: SIC 201, Kanwal Rekhi Building(KRESIT)

Lecture Notes

Lecture 1-3, July 19,22,26: Introduction, POS tagging [PDF]
Lecture 4-5, July 30,Aug 2: HMM, POS tagging [PDF]
Lecture 6-9, Aug 6,8,9,13: Viterbi; forward and backward; Baum Welch; IL POS tags [PDF]
Lecture 10-11, Aug 16,20: EM [PDF]
Lecture 12-13, Aug 23: parsing [PDF]
Lecture 14, Aug 30: parsing [PDF]
Lecture 15, Sep 17: parsing-ambiguity-prob [PDF]
Lecture 16, Sep 20: UNL [PDF]
Lecture 17-18, Sep 24,27:prob-parsing-parser-comparison [PDF]
Lecture 19 and 24, Oct 1,8,11:PCFG algos [PDF]
Lecture 20,21, Oct 2,4: Astar extra lecture [PDF]
Lecture 23, Oct 8: Binding Theory [PDF]
Lecture 25-26, Oct 10,15,18: Wordnet WSD [PDF]
Lecture 27, Oct 25: Wordnet relations and WSD [PDF]
Lecture 28,29, Oct 28,29: Speech Phonetics Phonology [PDF]
Lecture 30, Nov 1: Phonology Transliteration [PDF]
Lecture 31,32, Nov 5,6: Expectation Maximization [PDF]
Lecture 33, Nov 8,11: Transliteration [PDF]
Lecture 34, Nov 11: Maximum likelihood estimation [PDF]

top

Assignments

HMM based POS tagger
1. Train the system on the corpus supplied
2. Design a HMM based POS tagger implementing the Viterbi algorithm
3. Perform 5-fold cross validation and report overall precision, recall and F1 score as well as tag wise precision, recall and F1 score
4. Generate a confusion matrix (matrix where Aij denotes the number of times tag i is classified as tag j)
Comparison of Language Models
1. Prove that a language model based on POS tagged text is better than one developed from raw text
2. Choose a suitable NLP application for comparing the models (eg: autocompletion)
Comparison of Discrimininative vs Generative Models
1. Design a POS tagger based on discriminative model, i.e. learn P(tag sequence | sentence) directly. Use bigram assumption.
2. Compare the above model with the generative model based POS tagger designed earlier in terms of overall precision, tag-wise precision, recall and F1 score.
3. Show the diffences in confusion matrix in the two cases.
Parser projection
1. You are provided with parallel corpora in English and various Indian languages
2. Obtain the parse tree for the sentence in English using Stanford parser.
3. Project the parse tree into the chosen Indian language with the help of the equivalent sentence in the Indian language.
A* search based POS tagger
1. The starter code for A* is given
2. Design a A* search based POS tagger
3. Perform 5-fold cross validation and report overall precision, recall and F1 score as well as tag wise precision, recall and F1 score
4. Generate a confusion matrix (matrix where Aij denotes the number of times tag i is classified as tag j)
5. Compare the above results with HMM-based POS tagger

top

Resources

Link to Last Semester's webpage
Link to Spring 2011 webpage
Link to Autumn 2009 webpage
Link to 2008 webpage
Link to 2006 webpage

Suggested Reading Material:
- Text Books:
- Journals : Computational Linguistics, Natural Language Engineering, Machine Learning, Machine Translation, Artificial Intelligence
- Conferences : Annual Meeting of the Association of Computational Linguistics (ACL), Computational Linguistics (COLING), European ACL (EACL), Empirical Methods in NLP (EMNLP), Annual Meeting of the Special Interest Group in Information Retrieval (SIGIR), Human Language Technology (HLT).

top

Student Seminars

Gender Classification by Speech Analysis [PPT]
Neurolinguistics [ODP]
Question Answering System and Watson [PPT]
Sentiment Analysis with Multi-Modality [ODP][PPT][PDF]
Smoothing Techniques - A Primer [PPT]
Document Summarization [PPT]
Automatic Detection of Spamming and Phishing [PPT]
Recognizing Textual Entailment [PDF][PPT]
Semantic Web - Making the Web more readable for Machines [PDF]
Neurolinguistics [PPT]
Where the Computers and Arts meet [PPT][PDF]

top

Assignment Final Presentation
Groups

Gr-3 Sanober, Soumyajit, Naveen [PPT]
Gr-4 Raksha, Aditya, Anamay [PDF]
Gr-5 Geetanjali, Sachin, Deepak [PPT]
Gr-6 Mandar, Piyush, Abhirut [ZIP]
Gr-7 Shubham, Kallol, Rahul [TAR]
Gr-8 Akansha, Maunik, Hemant [PPT]
Gr-9 Nikhil, Subhash, Jayaprakash [GZ]
Gr-10 Kritika, Vinita, Rucha [TAR]
Gr-11 Biplab, Amit, Ravi [ZIP]

top

Marks and Grades

Will be uploaded as the course progresses

top