Date | Topics |
05-01-2016 |
- Introduction to Machine Learning
- Unannotated Slides
- Annotated Slides
- Chapter 1 of Hastie et. al.
- Homework: Intuitively analyse the machine learning problem of handwritten digit recognition (slides 8 and 9) taking cues from the analysis on slides 6 and 7.
|
08-01-2016 |
|
12-01-2016 |
- Basis functions/attributes, Least squared Regression and geometrical interpretation of its solution
- Unannotated Slides
- Annotated Slides
- Sections 2.2, 3.1 and 3.2 of Hastie et. al.
- Homework: Understand the concept of column space and the geometrical interpretation of the solution to the Least squared regression problem. What is polynomial regression on k indepenendent variables v1, v2.... vk and how would you achieve it using linear regression?
|
15-01-2016 |
- More matrix algebra for Least squared Regression, motivation for feature selection and regularized learning (through constraints), some basics of optimization
- Unannotated Slides
- Annotated Slides
- Tutorial 1
- Sections 3.2, 5.2.3 and Chapter 18 of Hastie et. al. and Section 4.1.4 of Optimization notes (assuming you are comfortable already until Section 4.1.3 from your basic calculus course).
- Homework: On page 12 of Annotated Slides, based on different inequalities between m and p, find the cases where this equation has (a) no solution (b) one solution and (c) multiple solutions.
- Homework: On page 18 of Annotated Slides, find the solution to the least squares regression problem based on necessary condition (gradient = 0).
|
19-01-2016 |
|
22-01-2016 |
- Convex Sets,Convex Functions,Strict Convex Functions and First Order,Second Order definition of Convex Functions
- Unannotated Slides
- Annotated Slides
- Tutorial 2
- Reference: Section 4.2 of Basics of Convex Optimization.
- Homework: Last page of Annotated Slides, Explain why the error on the train data reduces as the degree increases until 7. Why does the error on the test data also decrease until degree of 7? Now explain why the train continues to remain low even beyond degree of 7 whereas the test data starts increasing now.
|
29-01-2016 |
|
02-02-2016 |
|
05-02-2016 |
- Solution to honework on two equivalent formulations of ridge regression, Lasso and its two equivalent formulations, solution to quiz 1 problem 3, Iterative Soft Thresholding Algorithm (ISTA) for Lasso, Introduction to Support Vector Regression
- Unannotated Slides
- Annotated Slides
- Reference: Sections 3.4.2, 3.4.3, 3.8, 12.3.6 of Hastie et. al..
- Homework: Try deriving the KKT conditions for the two norm regularized Support Vector Regression problem on slide 18 of annotated slides
|
09-02-2016 |
- Support Vector Regression: formulation of optimization problem and geometric interpretation, derivation of KKT conditions, geometric interpretation of KKT conditions
- Unannotated Slides
- Annotated Slides
- Reference: Sections 3.4.2, 3.4.3, 3.8, 12.3.6 of Hastie et. al..
- Homework: Understand the summarized reasons (discussed so far) for regularization on page 2 of annotated slides
|
12-02-2016 |
|
16-02-2016 |
|
19-02-2016 |
|
01-03-2016 |
|
04-03-2016 |
- Rationale behind perceptron update, gradient descent and stochastic gradient descent for perceptron updates, convergence proof of perceptron update rule for linearly seperable case, kernel perceptron, Tutorial 6
- Unannotated Slides
- Annotated Slides
- Tutorial 6
- Reference: Sections 4.5.1 of Hastie et. al.
- Homework: Attempt tutorial 6
.
|
08-03-2016 |
|
11-03-2016 |
|
15-03-2016 |
|
18-03-2016 |
|
22-03-2016 |
|
29-03-2016 |
- Extensions to Logistic Regression: Kernelized Logistic Regression and Structured prediction through Conditional Random Fields (a Graphical Model), Training of Multi-layer logistic Neural etwork using Backpropogation, Convolutional Neural Networks and Structural Regularization, Structured Prediction through Recurrent Neural Networks, Decision Tree Learning
- Unannotated Slides Part 1
- Unannotated Slides Part 2 (Non-linear classification in Decision Trees)
- Annotated Slides Part 1
- Unannotated Slides Part 2 (Non-linear classification in Decision Trees)
- References: Sections 11.4 (Neural Network training), 11.7 (Convolutional Networks), 9.2.3 (Decision Trees) of Hastie et. al., Article by Trevor Hastie on Kernel Logistic Regression, CRF extension to Logistic Regression at tutorial at CIKM 2008 (Sections 1.4, 3.1 and 4.1) and on slides 7 to 11 here,
- Extra Reading(s): Structured Prediction through Graphical Models.
- Homework: Build decision trees for the problems on page 6 of these class slides
|
01-04-2016 |
- Decision Tree Learning, Top-down Induction of Decision Trees using Information Gain Criterion, Structural Regularization in Decision Tree through Early Stopping, Post-pruning and Rule pruning, Feature Selection, Evaluating Classifiers through Accuracy, Precision, Recall and F-measure, Non-linear Classification Through Support Vector Classification and its Dual
- Unannotated Slides
- Annotated Slides
- Tutorial 9: Hints for solutions in Section 9.2.3 and Section 12.3 of Hastie et. al. and a host of methods for feature selection
- Solutions to Tutorial 9
- References: Sections 9.2 (Decision Trees Induction and Information Gain Criterion, Post-pruning using cost-complexity), 18.3.4 (Feature Selection) and 12.2 and 12.3 (Support Vector Classification and its kernelized dual) of Hastie et. al., Sections 3 and 5 (on decision tree and pruning) of previous offering's class notes, Feature Selection using Information Gain and a host of other methods (survey paper)
- Extra Reading(s): Section 7 of previous offering notes for Hypothesis Testing for Decision Tree Building/Pruning.
|
05-04-2016 |
- Generative Classifiers, Multinomial Distribution and its Maximum Likelihood and Bayesian Estimates, Dirichlet Distribution, Multinomial Naive Bayes and its Maximum Likelihood Estimate, Bayesian Estimate for Naive Bayes (Tutorial 10 problem), Gaussian Discriminant Classifier
- Unannotated Slides
- Annotated Slides
- References: Sections 6.6.3 (Naive Bayes Classifier) and 4.3 (Gaussian Discriminant Analysis: Quadratic and Linear Discriminant Analysis) of Hastie et. al., Section 10 (Multivariate Bernoulli/Multinomial, Naive Bayes, Dirichlet distribution, ML and Bayesian Estimation) of previous offering's class notes
|
12-04-2016 |
- Quadratic and Linear Gaussian Discriminant Classifier, Estimation for Gaussian Discriminant, Multi-labeled classification, Mixture of Gaussians (GMM), EM Algorithm for GMM
- Unannotated Slides
- Annotated Slides
- Tutorial 10
- Solutions to Tutorial 10
- References: Sections 4.3 (Linear and Quadratic Discriminant Analysis), 4.3.1 (regularization for Gaussian Disriminant Analysis), 12.4 (Optional: Generalized Linear Discriminant), 13.2.3 (Mixture of Gaussians), 14.3.7 (EM for Mixture of Gaussians as soft-clustering) of Hastie et. al., Section 11 (Gaussian Discriminant Analysis) and Section 12 (Mixture of Gaussians, EM and clustering) of previous offering's class notes and Section 7.8 (more reading on unsupervised learning and EM algorithm) of notes on learning and inferencing in probabilistic models
|
15-04-2016 |
- General EM Algorithm, Special case for GMM, Hard EM for GMM as K-Means, K-Mediod and K-mode algorithms for clustering, Distance measures and Hierchical Clustering Methods
- Unannotated Slides
- Annotated Slides
- References: Sections 8.5.1 (EM Algorithm for Mixture of Gaussians), 8.5.2 (General EM Algorithm), 13.2.1 and 14.3.6 and 14.3.7 (K-means clustering), 14.3.2 and 14.3.3 (Distance/dissimilarity measures), 14.3.10 (K-mediods algorithm), 14.3.12 (Extra: Hierarchical Clustering) of Hastie et. al., Section 13 (Clustering) and Section 12.2 (EM Algo for clustering) of previous offering's class notes and Section 7.8 (more reading on unsupervised learning and EM algorithm) of notes on learning and inferencing in probabilistic models
|