CS705 Autumn 2006 Lecture Calendar
Scroll down for a tentative syllabus. Javascript must
be enabled to follow IITinternal paper links.
 20060727

 Administrative details
 List of some prerequisites
 Tentative course plan for the semester
 Example applications of statistical machine learning
 Introduction to regression and classification
 Formalizing a learning task, experience, reward
 Gaussian noise and linear leastsquare fit
 20060731

 A taste of Bayesian learning
 Hypothesis, prior and posterior distributions
 Posterior distribution and prediction
 Point estimates of posterior e.g. MAP
 Minimum description length and MAP
 Back to regression: linear leastsquare with square
regularizer
 20060803

 Guest lecture by Dr. Sreeram Balakrishnan, IBM IRL Delhi:
Long span contextual features for text classification and
entity role labeling
20060807
 SIGIR
 20060810

 Matlab and scilab tutorial, by Sandeep Deshmukh
 20060814

 Linear least square demo, Ridge penalty
 Lasso and its quadratic program
 Contrast between Ridge and Lasso, model sparsity
 From regression to classification
 Loss functions for classification and regression
 Canceled:
Guest lecture by Rajesh Parekh, Yahoo Research:
Data Mining and Research at Yahoo!: Insights, Lessons,
and Challenges

 20060817

 "True loss" and various approximations
 Square loss and its limitations
 Choice of discriminant functions
 Class density and class discrimination
 Discriminants for multivariate Gaussian densities

 20060821
 Guest lecture on spectral methods and singular value
decomposition
by Prof. Abhiram Ranade.
20060824

 20060828

 Eigen demo
 EigenSVD connection
 SVD demo with lowrank plus noise matrix
 Connection between SVD and (regularized) least square
 PCA demo
 20060831

 Return to Linear discriminants and fitting criteria
 Hillclimbing, step size, and Newton method
 Derivation of the Perceptron from gradient descent considerations
 Kernel regression and kernel density estimation
 Bayesian interpretation and motivation for maxmargin
 Maxmargin formulation
 20060904

 Basic SVM QP for separable problems, scilab demo
 Inseparable problems and hinge loss
 Smooth approximations to hinge loss, direct primal optimization
 QP with slack variables, matlab/scilab demo
 20060907

 Primaldual, Gordan's theorem, KKT necessary conditions
 Lagrangian saddlepoint
 Dual and Lagrangian, dualizing basic SVM, scilab demo
 Midterm exam,
09:3011:30 room A1/A2 Math
20060911
20060914
 Midterm week
 20060918

 Dual QP optimization via SMO
 Using nonlinear kernels with the dual formulation
 Dual with kernels, scilab demo
 Lagrangian support vector machines
 20060921

 Lagrangian and proximal support vector machines, scilab demo
 Finite Newton optimization of primal SVM by Keerthi and DeCoste
 20060925

 Complete Keerthi and DeCoste algorithm
 Nonconvex optimization for SVMs
 Transductive SVM: pair swaps
 20060928

 Transductive SVM: deterministic annealing
 Graph Laplacian, its spectral properties, mincut
 Bagging and boosting  joint lecture with
IT 608
20061002
 Gandhi Jayanti and Dussehra
 20061005

 Ratio cuts
and spectral transduction
 Maxmargin ranking and ordinal regression

 Extending maxmargin formulation to general Ψ(x,y)
 Example applications: Markov chains, PCFG, etc.
 Max margin classification with joint features Ψ(x,y)
 Very large number of primal constraints and dual variables
 20061009

 The cutting plane algorithm and StructSVM
 20061012

 Approximate lineartime linear SVM and maxmargin ranking
 SVM for multivariate performance measures
 20061016

 Complete multivariate performance measures
 Risk and generalization bounds, intro
 20061019

 Risk bounds: hypothesis consistent with training set
 Bounds on true risk minus empirical (training) risk
 Growth function, VC dimension
20061023
 Diwali
 20061026

 Concluding part of growth function and VCdimension
 Role of maxmargin in bounding growth function (Alekh Agarwal)
 20061030

 Concluding part of maxmargin and growth function (Alekh Agarwal)
 On to probabilistic learning:
Sampling the posterior if Pr(hD) is simple
 The Metropolis MCMC algorithm for complicated Pr(hD)
 MetropolisHastings algorithm, proof of correctness
 Gibb's Q(h,h') for sampling
 20061102

 MCMC intuition and example
 Semisupervised or unsupervised generative models
 Expectation maximization and its variational interpretation
 EM demo and initialization issues
 Conditional probabilistic models
 Binomial deviance loss and logistic regression
 20061106

 Comparison between logistic, hinge and true loss
 Optimization techniques: IRLS, iterative scaling,
generic Newton method
 Structured prediction, constraints, dualization
 Connections to StructSVM
 The connection with
maximum entropy
 20061109

 Nonnegative matrix factorization
 Iterative scaling, NMF demo
 Dyadic factors of boolean matrices
 Crossassociations and coclustering
 Ratedistortion and relation to crossassociation
 Information bottleneck and approximating distributions
 Final exam,
09:0013:00 room A1/A2 Math