CS705 Autumn 2006 Lecture Calendar

Scroll down for a tentative syllabus. Javascript must be enabled to follow IIT-internal paper links.

2006-07-27

Administrative details
List of some prerequisites
Tentative course plan for the semester
Example applications of statistical machine learning
Introduction to regression and classification
Formalizing a learning task, experience, reward
Gaussian noise and linear least-square fit

2006-07-31

A taste of Bayesian learning
Hypothesis, prior and posterior distributions
Posterior distribution and prediction
Point estimates of posterior e.g. MAP
Minimum description length and MAP
Back to regression: linear least-square with square regularizer

2006-08-03

Guest lecture by Dr. Sreeram Balakrishnan, IBM IRL Delhi: Long span contextual features for text classification and entity role labeling

~~2006-08-07~~

SIGIR

2006-08-10

Matlab and scilab tutorial, by Sandeep Deshmukh

2006-08-14

Linear least square demo, Ridge penalty
Lasso and its quadratic program
Contrast between Ridge and Lasso, model sparsity
From regression to classification
Loss functions for classification and regression

~~2006-08-15 4pm SIC301~~

Canceled: Guest lecture by Rajesh Parekh, Yahoo Research: Data Mining and Research at Yahoo!: Insights, Lessons, and Challenges

2006-08-17

"True loss" and various approximations
Square loss and its limitations
Choice of discriminant functions
Class density and class discrimination
Discriminants for multivariate Gaussian densities

2006-08-21

Guest lecture on spectral methods and singular value decomposition by Prof. Abhiram Ranade.

~~2006-08-24~~

SIGKDD
SVD notes

2006-08-28

Eigen demo
Eigen-SVD connection
SVD demo with low-rank plus noise matrix
Connection between SVD and (regularized) least square
PCA demo

2006-08-31

Return to Linear discriminants and fitting criteria
Hill-climbing, step size, and Newton method
Derivation of the Perceptron from gradient descent considerations
Kernel regression and kernel density estimation
Bayesian interpretation and motivation for max-margin
Max-margin formulation

2006-09-04

Basic SVM QP for separable problems, scilab demo
Inseparable problems and hinge loss
Smooth approximations to hinge loss, direct primal optimization
QP with slack variables, matlab/scilab demo

2006-09-07

Primal-dual, Gordan's theorem, KKT necessary conditions
Lagrangian saddlepoint
Dual and Lagrangian, dualizing basic SVM, scilab demo

2006-09-10

Midterm exam, 09:30--11:30 room A1/A2 Math

~~2006-09-11~~

~~2006-09-14~~

Midterm week

2006-09-18

Dual QP optimization via SMO
Using non-linear kernels with the dual formulation
Dual with kernels, scilab demo
Lagrangian support vector machines

2006-09-21

Lagrangian and proximal support vector machines, scilab demo
Finite Newton optimization of primal SVM by Keerthi and DeCoste

2006-09-25

Complete Keerthi and DeCoste algorithm
Non-convex optimization for SVMs
Transductive SVM: pair swaps

2006-09-28

Transductive SVM: deterministic annealing
Graph Laplacian, its spectral properties, mincut

2006-09-29

Bagging and boosting -- joint lecture with IT 608

~~2006-10-02~~

Gandhi Jayanti and Dussehra

2006-10-05

Ratio cuts and spectral transduction
Max-margin ranking and ordinal regression

2006-10-08

Extending max-margin formulation to general Ψ(x,y)
Example applications: Markov chains, PCFG, etc.
Max margin classification with joint features Ψ(x,y)
Very large number of primal constraints and dual variables

2006-10-09

The cutting plane algorithm and StructSVM

2006-10-12

Approximate linear-time linear SVM and max-margin ranking
SVM for multivariate performance measures

2006-10-16

Complete multivariate performance measures
Risk and generalization bounds, intro

2006-10-19

Risk bounds: hypothesis consistent with training set
Bounds on true risk minus empirical (training) risk
Growth function, VC dimension

~~2006-10-23~~

Diwali

2006-10-26

Concluding part of growth function and VC-dimension
Role of max-margin in bounding growth function (Alekh Agarwal)

2006-10-30

Concluding part of max-margin and growth function (Alekh Agarwal)
On to probabilistic learning: Sampling the posterior if Pr(h|D) is simple
The Metropolis MCMC algorithm for complicated Pr(h|D)
Metropolis-Hastings algorithm, proof of correctness
Gibb's Q(h,h') for sampling

2006-11-02

MCMC intuition and example
Semisupervised or unsupervised generative models
Expectation maximization and its variational interpretation
EM demo and initialization issues
Conditional probabilistic models
Binomial deviance loss and logistic regression

2006-11-06

Comparison between logistic, hinge and true loss
Optimization techniques: IRLS, iterative scaling, generic Newton method
Structured prediction, constraints, dualization
Connections to StructSVM
The connection with maximum entropy

2006-11-09

Non-negative matrix factorization
Iterative scaling, NMF demo
Dyadic factors of boolean matrices
Cross-associations and co-clustering
Rate-distortion and relation to cross-association
Information bottleneck and approximating distributions

2006-11-18

Final exam, 09:00--13:00 room A1/A2 Math