CS 725+CS 709 Spring 2016, Calendar

This page will provide updated information on what is covered in each lecture along with pdf of the hand written material from each class.

The course notes are available from course page.

Date Summary

-
Practice problem set: Please find here a growing list of practice problems. You can also refer to this wiki for definitions of several concepts relevant to this course, connections between those concepts through theorems and proofs of those theorems.

It is very important that you attempt the homework problems that were posted at the end of (almost) each lecture.

You can consider solving the following problems from Hastie et. al. : (A) 2.1, 3.2 , 3.5 , 3.6 , 3.7 , 3.12 , 3.19 , 3.21 , 3.23 , 3.27 , 3.28 , 3.29 , 3.30 , 4.2 , 4.3 , 4.4 , 4.5 , 4.6 , 4.7 , 4.8 , 5.16 , 6.2 , 6.4 , 6.6 , 6.9 , 6.11 , 8.1 , 8.2 , 8.5, 8.7, 10.5a, 10.8, 10.12, 11.2, 11.3, 11.4, 11.5, 11.7, 12.1, 12.2, 12.9, 12.10, 12.11 (harder), 13.1, 14.1, 14.2, 14.23 (harder)

Date Topics

05-01-2016

Introduction to Machine Learning

Unannotated Slides

Annotated Slides

Chapter 1 of Hastie et. al.
Homework: Intuitively analyse the machine learning problem of handwritten digit recognition (slides 8 and 9) taking cues from the analysis on slides 6 and 7.

08-01-2016

Unsupervised learning vs. Supervised learning, Regression, Tutorial 0

Unannotated Slides

Annotated Slides

Sections 2.1 and 2.3.1 of Hastie et. al.
Homework: Complete attempting problems from Tutorial 0

12-01-2016

Basis functions/attributes, Least squared Regression and geometrical interpretation of its solution

Unannotated Slides

Annotated Slides

Sections 2.2, 3.1 and 3.2 of Hastie et. al.
Homework: Understand the concept of column space and the geometrical interpretation of the solution to the Least squared regression problem. What is polynomial regression on k indepenendent variables v1, v2.... vk and how would you achieve it using linear regression?

15-01-2016

More matrix algebra for Least squared Regression, motivation for feature selection and regularized learning (through constraints), some basics of optimization

Unannotated Slides

Annotated Slides

Tutorial 1

Sections 3.2, 5.2.3 and Chapter 18 of Hastie et. al. and Section 4.1.4 of Optimization notes (assuming you are comfortable already until Section 4.1.3 from your basic calculus course).
Homework: On page 12 of Annotated Slides, based on different inequalities between m and p, find the cases where this equation has (a) no solution (b) one solution and (c) multiple solutions.
Homework: On page 18 of Annotated Slides, find the solution to the least squares regression problem based on necessary condition (gradient = 0).

19-01-2016

Local necessary and sufficient conditions for minima (maxima), gradient descent algorithm and the general descent algorithm

Unannotated Slides

Annotated Slides

Reference: Page 31 page of Linear algebra notes for number of solutions to system of linear equations for different dimensions of the matrix.
Reference: First 40 pages of Notes on optimization, including review of calculus.
Homework: With reference to page 31 of linear algebra notes, based on different inequalities between m and p, find the cases where this equation has (a) no solution (b) one so lution and (c) multiple solutions.

22-01-2016

Convex Sets,Convex Functions,Strict Convex Functions and First Order,Second Order definition of Convex Functions

Unannotated Slides

Annotated Slides

Tutorial 2

Reference: Section 4.2 of Basics of Convex Optimization.
Homework: Last page of Annotated Slides, Explain why the error on the train data reduces as the degree increases until 7. Why does the error on the test data also decrease until degree of 7? Now explain why the train continues to remain low even beyond degree of 7 whereas the test data starts increasing now.

29-01-2016

Overfitting, regularization for regression, constrained optimization, lagrange multipliers, lagrange function, lagrange dual, dual optimization problem, Karush Kuhn Tucker conditions

Unannotated Slides

Annotated Slides

Tutorial 3

Reference: Section 4.4 of Basics of Convex Optimization.

02-02-2016

Karush Kuhn Tucker conditions applied to ridge regression, two equivalent formulations of ridge regression

Unannotated Slides

Annotated Slides

Quiz 1 and Quiz 1

Reference: Section 4.4 of Basics of Convex Optimization and Section 3.4.1 of Hastie et. al..
Homework: Complete the argument (on two equivalent formulations of ridge regression) marked as homework on page 7 of the annotated slides

05-02-2016

Solution to honework on two equivalent formulations of ridge regression, Lasso and its two equivalent formulations, solution to quiz 1 problem 3, Iterative Soft Thresholding Algorithm (ISTA) for Lasso, Introduction to Support Vector Regression

Unannotated Slides

Annotated Slides

Reference: Sections 3.4.2, 3.4.3, 3.8, 12.3.6 of Hastie et. al..
Homework: Try deriving the KKT conditions for the two norm regularized Support Vector Regression problem on slide 18 of annotated slides

09-02-2016

Support Vector Regression: formulation of optimization problem and geometric interpretation, derivation of KKT conditions, geometric interpretation of KKT conditions

Unannotated Slides

Annotated Slides

Reference: Sections 3.4.2, 3.4.3, 3.8, 12.3.6 of Hastie et. al..
Homework: Understand the summarized reasons (discussed so far) for regularization on page 2 of annotated slides

12-02-2016

Support Vector Regression: geometric interpretation of KKT conditions, derivation of the dual optimization problem, discovery of dot product as kernel function

Unannotated Slides

Annotated Slides

Tutorial 4

Reference: Sections 12.3.6 of Hastie et. al..
Homework: Understand the derivation of the dual of Support Vector Regression (SVR) on pages 9-12 of unannotated slides

16-02-2016

SVR dual and Kernel, Kernel Ridge Regression, SMO algorithm and block coordinate ascent

Unannotated Slides

Annotated Slides

Reference: Sections 12.3.6 of Hastie et. al. and this paper on kernel ridge regression
Homework: What would you expect the basis vector (\phi) to look like for the polynomial kernel on pages 9 and 17 of the annotated slides

19-02-2016

Basis function expansion, positive semi-definite kernel, Mercer's theorem, Nearest Neighbor regression and general local regression methods

Unannotated Slides

Annotated Slides

Tutorial 5 and and its solutions

Reference: Sections 2.8.1, 2.8.2 and 5.8.1 of Hastie et. al.

01-03-2016

Local (kernelized) linear regression, solutions to midsem question, perceptron classifier and perceptron update

Unannotated Slides

Annotated Slides

Midsem question paper and and its solutions

Reference: Sections 6.1.1, 4.5.1 of Hastie et. al.
Homework: Try and prove the convergence of the perceptron update rule on page 9 of the annotated slides
.

04-03-2016

Rationale behind perceptron update, gradient descent and stochastic gradient descent for perceptron updates, convergence proof of perceptron update rule for linearly seperable case, kernel perceptron, Tutorial 6

Unannotated Slides

Annotated Slides

Tutorial 6
Reference: Sections 4.5.1 of Hastie et. al.
Homework: Attempt tutorial 6
.

08-03-2016

Non-linear classification boundary through kernel perceptron and neural networks, multi-class classification through neural networks, Tutorial 6 solutions

Unannotated Slides

Annotated Slides

Tutorial 6 Solutions
Reference: Sections 11.1-11.4 of Hastie et. al.

11-03-2016

Introduction to probability through discriminative and generative classification, Logistic Regression as a single node (smooth) neural network, Maximum Likeilhood for parameter estimation

Unannotated Slides

Annotated Slides

Tutorial 7 and its solution
Introduction to probability
Reference: Sections 4.4 and 11.4 of Hastie et. al.

15-03-2016

Logistic regression, Cross Entropy Loss function and its connection to Maximum Likelihood estimation, Gradient Descent for Logistic Regression

Unannotated Slides

Annotated Slides

Tutorial 7 and its solution
Introduction to probability
Quiz 2
Solution to quiz 2
Reference: Section 8.2 of Hastie et. al.

18-03-2016

Gradient update for Logistic Regression, Bayesian (MAP) estimation, Connection between MAP estimation and Regularization

Unannotated Slides

Annotated Slides

References: Sections 2.8.1, 8.2, 8.3 and 4.4 of Hastie et. al..

22-03-2016

Bayesian Logistic Regression and Connection between MAP estimation and Regularized Logistic Regression, Tutorial on Bayesian Linear Regression

Unannotated Slides

Annotated Slides

Tutorial 8

Solutions to Tutorial 8

Homework: Try deriving the update rules for Multi-layer neural network with the Logistic function

References: Sections 8.2, 8.3 of Hastie et. al. and reading on Bayesian Logistic Regression and another reading on Bayesian Linear Regression.

29-03-2016

Extensions to Logistic Regression: Kernelized Logistic Regression and Structured prediction through Conditional Random Fields (a Graphical Model), Training of Multi-layer logistic Neural etwork using Backpropogation, Convolutional Neural Networks and Structural Regularization, Structured Prediction through Recurrent Neural Networks, Decision Tree Learning

Unannotated Slides Part 1

Unannotated Slides Part 2 (Non-linear classification in Decision Trees)

Annotated Slides Part 1

Unannotated Slides Part 2 (Non-linear classification in Decision Trees)

References: Sections 11.4 (Neural Network training), 11.7 (Convolutional Networks), 9.2.3 (Decision Trees) of Hastie et. al., Article by Trevor Hastie on Kernel Logistic Regression, CRF extension to Logistic Regression at tutorial at CIKM 2008 (Sections 1.4, 3.1 and 4.1) and on slides 7 to 11 here,
Extra Reading(s): Structured Prediction through Graphical Models.
Homework: Build decision trees for the problems on page 6 of these class slides

01-04-2016

Decision Tree Learning, Top-down Induction of Decision Trees using Information Gain Criterion, Structural Regularization in Decision Tree through Early Stopping, Post-pruning and Rule pruning, Feature Selection, Evaluating Classifiers through Accuracy, Precision, Recall and F-measure, Non-linear Classification Through Support Vector Classification and its Dual

Unannotated Slides

Annotated Slides

Tutorial 9: Hints for solutions in Section 9.2.3 and Section 12.3 of Hastie et. al. and a host of methods for feature selection
Solutions to Tutorial 9
References: Sections 9.2 (Decision Trees Induction and Information Gain Criterion, Post-pruning using cost-complexity), 18.3.4 (Feature Selection) and 12.2 and 12.3 (Support Vector Classification and its kernelized dual) of Hastie et. al., Sections 3 and 5 (on decision tree and pruning) of previous offering's class notes, Feature Selection using Information Gain and a host of other methods (survey paper)
Extra Reading(s): Section 7 of previous offering notes for Hypothesis Testing for Decision Tree Building/Pruning.

05-04-2016

Generative Classifiers, Multinomial Distribution and its Maximum Likelihood and Bayesian Estimates, Dirichlet Distribution, Multinomial Naive Bayes and its Maximum Likelihood Estimate, Bayesian Estimate for Naive Bayes (Tutorial 10 problem), Gaussian Discriminant Classifier

Unannotated Slides

Annotated Slides

References: Sections 6.6.3 (Naive Bayes Classifier) and 4.3 (Gaussian Discriminant Analysis: Quadratic and Linear Discriminant Analysis) of Hastie et. al., Section 10 (Multivariate Bernoulli/Multinomial, Naive Bayes, Dirichlet distribution, ML and Bayesian Estimation) of previous offering's class notes

12-04-2016

Quadratic and Linear Gaussian Discriminant Classifier, Estimation for Gaussian Discriminant, Multi-labeled classification, Mixture of Gaussians (GMM), EM Algorithm for GMM

Unannotated Slides

Annotated Slides

Tutorial 10

Solutions to Tutorial 10

References: Sections 4.3 (Linear and Quadratic Discriminant Analysis), 4.3.1 (regularization for Gaussian Disriminant Analysis), 12.4 (Optional: Generalized Linear Discriminant), 13.2.3 (Mixture of Gaussians), 14.3.7 (EM for Mixture of Gaussians as soft-clustering) of Hastie et. al., Section 11 (Gaussian Discriminant Analysis) and Section 12 (Mixture of Gaussians, EM and clustering) of previous offering's class notes and Section 7.8 (more reading on unsupervised learning and EM algorithm) of notes on learning and inferencing in probabilistic models

15-04-2016

General EM Algorithm, Special case for GMM, Hard EM for GMM as K-Means, K-Mediod and K-mode algorithms for clustering, Distance measures and Hierchical Clustering Methods

Unannotated Slides

Annotated Slides

References: Sections 8.5.1 (EM Algorithm for Mixture of Gaussians), 8.5.2 (General EM Algorithm), 13.2.1 and 14.3.6 and 14.3.7 (K-means clustering), 14.3.2 and 14.3.3 (Distance/dissimilarity measures), 14.3.10 (K-mediods algorithm), 14.3.12 (Extra: Hierarchical Clustering) of Hastie et. al., Section 13 (Clustering) and Section 12.2 (EM Algo for clustering) of previous offering's class notes and Section 7.8 (more reading on unsupervised learning and EM algorithm) of notes on learning and inferencing in probabilistic models

Date	Topics
05-01-2016	Introduction to Machine Learning Unannotated Slides Annotated Slides Chapter 1 of Hastie et. al. Homework: Intuitively analyse the machine learning problem of handwritten digit recognition (slides 8 and 9) taking cues from the analysis on slides 6 and 7.
08-01-2016	Unsupervised learning vs. Supervised learning, Regression, Tutorial 0 Unannotated Slides Annotated Slides Sections 2.1 and 2.3.1 of Hastie et. al. Homework: Complete attempting problems from Tutorial 0
12-01-2016	Basis functions/attributes, Least squared Regression and geometrical interpretation of its solution Unannotated Slides Annotated Slides Sections 2.2, 3.1 and 3.2 of Hastie et. al. Homework: Understand the concept of column space and the geometrical interpretation of the solution to the Least squared regression problem. What is polynomial regression on k indepenendent variables v1, v2.... vk and how would you achieve it using linear regression?
15-01-2016	More matrix algebra for Least squared Regression, motivation for feature selection and regularized learning (through constraints), some basics of optimization Unannotated Slides Annotated Slides Tutorial 1 Sections 3.2, 5.2.3 and Chapter 18 of Hastie et. al. and Section 4.1.4 of Optimization notes (assuming you are comfortable already until Section 4.1.3 from your basic calculus course). Homework: On page 12 of Annotated Slides, based on different inequalities between m and p, find the cases where this equation has (a) no solution (b) one solution and (c) multiple solutions. Homework: On page 18 of Annotated Slides, find the solution to the least squares regression problem based on necessary condition (gradient = 0).
19-01-2016	Local necessary and sufficient conditions for minima (maxima), gradient descent algorithm and the general descent algorithm Unannotated Slides Annotated Slides Reference: Page 31 page of Linear algebra notes for number of solutions to system of linear equations for different dimensions of the matrix. Reference: First 40 pages of Notes on optimization, including review of calculus. Homework: With reference to page 31 of linear algebra notes, based on different inequalities between m and p, find the cases where this equation has (a) no solution (b) one so lution and (c) multiple solutions.
22-01-2016	Convex Sets,Convex Functions,Strict Convex Functions and First Order,Second Order definition of Convex Functions Unannotated Slides Annotated Slides Tutorial 2 Reference: Section 4.2 of Basics of Convex Optimization. Homework: Last page of Annotated Slides, Explain why the error on the train data reduces as the degree increases until 7. Why does the error on the test data also decrease until degree of 7? Now explain why the train continues to remain low even beyond degree of 7 whereas the test data starts increasing now.
29-01-2016	Overfitting, regularization for regression, constrained optimization, lagrange multipliers, lagrange function, lagrange dual, dual optimization problem, Karush Kuhn Tucker conditions Unannotated Slides Annotated Slides Tutorial 3 Reference: Section 4.4 of Basics of Convex Optimization.
02-02-2016	Karush Kuhn Tucker conditions applied to ridge regression, two equivalent formulations of ridge regression Unannotated Slides Annotated Slides Quiz 1 and Quiz 1 Reference: Section 4.4 of Basics of Convex Optimization and Section 3.4.1 of Hastie et. al.. Homework: Complete the argument (on two equivalent formulations of ridge regression) marked as homework on page 7 of the annotated slides
05-02-2016	Solution to honework on two equivalent formulations of ridge regression, Lasso and its two equivalent formulations, solution to quiz 1 problem 3, Iterative Soft Thresholding Algorithm (ISTA) for Lasso, Introduction to Support Vector Regression Unannotated Slides Annotated Slides Reference: Sections 3.4.2, 3.4.3, 3.8, 12.3.6 of Hastie et. al.. Homework: Try deriving the KKT conditions for the two norm regularized Support Vector Regression problem on slide 18 of annotated slides
09-02-2016	Support Vector Regression: formulation of optimization problem and geometric interpretation, derivation of KKT conditions, geometric interpretation of KKT conditions Unannotated Slides Annotated Slides Reference: Sections 3.4.2, 3.4.3, 3.8, 12.3.6 of Hastie et. al.. Homework: Understand the summarized reasons (discussed so far) for regularization on page 2 of annotated slides
12-02-2016	Support Vector Regression: geometric interpretation of KKT conditions, derivation of the dual optimization problem, discovery of dot product as kernel function Unannotated Slides Annotated Slides Tutorial 4 Reference: Sections 12.3.6 of Hastie et. al.. Homework: Understand the derivation of the dual of Support Vector Regression (SVR) on pages 9-12 of unannotated slides
16-02-2016	SVR dual and Kernel, Kernel Ridge Regression, SMO algorithm and block coordinate ascent Unannotated Slides Annotated Slides Reference: Sections 12.3.6 of Hastie et. al. and this paper on kernel ridge regression Homework: What would you expect the basis vector (\phi) to look like for the polynomial kernel on pages 9 and 17 of the annotated slides
19-02-2016	Basis function expansion, positive semi-definite kernel, Mercer's theorem, Nearest Neighbor regression and general local regression methods Unannotated Slides Annotated Slides Tutorial 5 and and its solutions Reference: Sections 2.8.1, 2.8.2 and 5.8.1 of Hastie et. al.
01-03-2016	Local (kernelized) linear regression, solutions to midsem question, perceptron classifier and perceptron update Unannotated Slides Annotated Slides Midsem question paper and and its solutions Reference: Sections 6.1.1, 4.5.1 of Hastie et. al. Homework: Try and prove the convergence of the perceptron update rule on page 9 of the annotated slides .
04-03-2016	Rationale behind perceptron update, gradient descent and stochastic gradient descent for perceptron updates, convergence proof of perceptron update rule for linearly seperable case, kernel perceptron, Tutorial 6 Unannotated Slides Annotated Slides Tutorial 6 Reference: Sections 4.5.1 of Hastie et. al. Homework: Attempt tutorial 6 .
08-03-2016	Non-linear classification boundary through kernel perceptron and neural networks, multi-class classification through neural networks, Tutorial 6 solutions Unannotated Slides Annotated Slides Tutorial 6 Solutions Reference: Sections 11.1-11.4 of Hastie et. al.
11-03-2016	Introduction to probability through discriminative and generative classification, Logistic Regression as a single node (smooth) neural network, Maximum Likeilhood for parameter estimation Unannotated Slides Annotated Slides Tutorial 7 and its solution Introduction to probability Reference: Sections 4.4 and 11.4 of Hastie et. al.
15-03-2016	Logistic regression, Cross Entropy Loss function and its connection to Maximum Likelihood estimation, Gradient Descent for Logistic Regression Unannotated Slides Annotated Slides Tutorial 7 and its solution Introduction to probability Quiz 2 Solution to quiz 2 Reference: Section 8.2 of Hastie et. al.
18-03-2016	Gradient update for Logistic Regression, Bayesian (MAP) estimation, Connection between MAP estimation and Regularization Unannotated Slides Annotated Slides References: Sections 2.8.1, 8.2, 8.3 and 4.4 of Hastie et. al..
22-03-2016	Bayesian Logistic Regression and Connection between MAP estimation and Regularized Logistic Regression, Tutorial on Bayesian Linear Regression Unannotated Slides Annotated Slides Tutorial 8 Solutions to Tutorial 8 Homework: Try deriving the update rules for Multi-layer neural network with the Logistic function References: Sections 8.2, 8.3 of Hastie et. al. and reading on Bayesian Logistic Regression and another reading on Bayesian Linear Regression.
29-03-2016	Extensions to Logistic Regression: Kernelized Logistic Regression and Structured prediction through Conditional Random Fields (a Graphical Model), Training of Multi-layer logistic Neural etwork using Backpropogation, Convolutional Neural Networks and Structural Regularization, Structured Prediction through Recurrent Neural Networks, Decision Tree Learning Unannotated Slides Part 1 Unannotated Slides Part 2 (Non-linear classification in Decision Trees) Annotated Slides Part 1 Unannotated Slides Part 2 (Non-linear classification in Decision Trees) References: Sections 11.4 (Neural Network training), 11.7 (Convolutional Networks), 9.2.3 (Decision Trees) of Hastie et. al., Article by Trevor Hastie on Kernel Logistic Regression, CRF extension to Logistic Regression at tutorial at CIKM 2008 (Sections 1.4, 3.1 and 4.1) and on slides 7 to 11 here, Extra Reading(s): Structured Prediction through Graphical Models. Homework: Build decision trees for the problems on page 6 of these class slides
01-04-2016	Decision Tree Learning, Top-down Induction of Decision Trees using Information Gain Criterion, Structural Regularization in Decision Tree through Early Stopping, Post-pruning and Rule pruning, Feature Selection, Evaluating Classifiers through Accuracy, Precision, Recall and F-measure, Non-linear Classification Through Support Vector Classification and its Dual Unannotated Slides Annotated Slides Tutorial 9: Hints for solutions in Section 9.2.3 and Section 12.3 of Hastie et. al. and a host of methods for feature selection Solutions to Tutorial 9 References: Sections 9.2 (Decision Trees Induction and Information Gain Criterion, Post-pruning using cost-complexity), 18.3.4 (Feature Selection) and 12.2 and 12.3 (Support Vector Classification and its kernelized dual) of Hastie et. al., Sections 3 and 5 (on decision tree and pruning) of previous offering's class notes, Feature Selection using Information Gain and a host of other methods (survey paper) Extra Reading(s): Section 7 of previous offering notes for Hypothesis Testing for Decision Tree Building/Pruning.
05-04-2016	Generative Classifiers, Multinomial Distribution and its Maximum Likelihood and Bayesian Estimates, Dirichlet Distribution, Multinomial Naive Bayes and its Maximum Likelihood Estimate, Bayesian Estimate for Naive Bayes (Tutorial 10 problem), Gaussian Discriminant Classifier Unannotated Slides Annotated Slides References: Sections 6.6.3 (Naive Bayes Classifier) and 4.3 (Gaussian Discriminant Analysis: Quadratic and Linear Discriminant Analysis) of Hastie et. al., Section 10 (Multivariate Bernoulli/Multinomial, Naive Bayes, Dirichlet distribution, ML and Bayesian Estimation) of previous offering's class notes
12-04-2016	Quadratic and Linear Gaussian Discriminant Classifier, Estimation for Gaussian Discriminant, Multi-labeled classification, Mixture of Gaussians (GMM), EM Algorithm for GMM Unannotated Slides Annotated Slides Tutorial 10 Solutions to Tutorial 10 References: Sections 4.3 (Linear and Quadratic Discriminant Analysis), 4.3.1 (regularization for Gaussian Disriminant Analysis), 12.4 (Optional: Generalized Linear Discriminant), 13.2.3 (Mixture of Gaussians), 14.3.7 (EM for Mixture of Gaussians as soft-clustering) of Hastie et. al., Section 11 (Gaussian Discriminant Analysis) and Section 12 (Mixture of Gaussians, EM and clustering) of previous offering's class notes and Section 7.8 (more reading on unsupervised learning and EM algorithm) of notes on learning and inferencing in probabilistic models
15-04-2016	General EM Algorithm, Special case for GMM, Hard EM for GMM as K-Means, K-Mediod and K-mode algorithms for clustering, Distance measures and Hierchical Clustering Methods Unannotated Slides Annotated Slides References: Sections 8.5.1 (EM Algorithm for Mixture of Gaussians), 8.5.2 (General EM Algorithm), 13.2.1 and 14.3.6 and 14.3.7 (K-means clustering), 14.3.2 and 14.3.3 (Distance/dissimilarity measures), 14.3.10 (K-mediods algorithm), 14.3.12 (Extra: Hierarchical Clustering) of Hastie et. al., Section 13 (Clustering) and Section 12.2 (EM Algo for clustering) of previous offering's class notes and Section 7.8 (more reading on unsupervised learning and EM algorithm) of notes on learning and inferencing in probabilistic models