- 2005-07-28
- Administrative details
- Learning task, experience, reward
- Bayesian learning
- Hypothesis, prior and posterior distributions

- 2005-08-01
- Posterior distribution and prediction
- Point estimates of posterior e.g. MAP
- The connection between MAP and MDL
- Sampling the posterior if Pr(h|D) is simple
- The Metropolis MCMC algorithm for complicated Pr(h|D)

- 2005-08-04
- Metropolis algorithm, proof of correctness
- Gibb's Q(h,h') for sampling
- Posterior density and class discrimination
- Discriminants for multivariate Gaussian densities

~~2005-08-08~~- ICML
~~2005-08-11~~- ICML
~~2005-08-15~~- Freedom Day
- 2005-08-18
- Discriminants for multivariate Gaussian densities
- Linear discriminants and fitting criteria
- Hill-climbing, step size, and Newton method

- 2005-08-22
- Loss functions for classification and regression
- "True loss" and approximations
- Square loss and its limitations
- Regularized least squares (RLS)

- 2005-08-23
- Quick intro to SVD
- RLS and the SVD connection
- Lasso and its quadratic program
- Contrast between Ridge and Lasso

- 2005-08-25
- Motivation for max-margin
- Basic SVM QP for separable problems, Matlab demo
- Gordan's theorem, KKT necessary conditions
- Dual and Lagrangian, dualizing basic SVM
- Significance of local regression and kernels

- 2005-08-30
- Recap of Lagrangian saddlepoint
- Inseparable problems and hingle loss
- QP with slack variables, Matlab demo

~~2005-09-01~~- IAS meeting
- 2005-09-05
- Dual QP optimization, Matlab demo
- Non-linear discriminants and kernels
- Direct dual QP optimization via SMO
- Proximal support vector machines

- 2005-09-08
- Proximal and Lagrangian support vector machines
- Binomial deviance loss and logistic regression
- Line search
- Newton updates and IRLS

- 2005-09-12
- More about logistic loss vs. true loss
- The connection with maximum entropy

- 2005-09-13
- Joint lecture with IT 655 on bagging and boosting (Sunita Sarawagi)

- 2005-09-15
- Comparison between logistic and hinge loss
- Extending max-margin formulation to general Φ(x,y)
- Iterative scaling tricks for log-linear models (Arpit Mathur)

~~2005-09-19~~- Midterm week
~~2005-09-22~~- Midterm week
- 2005-09-24
- Midterm exam, 9:30am--12:30am, A1/A2 Math
- 2005-09-26
- Midterm review
- Max margin classification with joint features Φ(x,y)
- Very large number of primal constraints and dual variables

- 2005-09-29
- Very large number of primal constraints and dual variables
- The cutting plane algorithm and StructSVM
- Example applications: Markov chains, alignment, etc.

- 2005-10-03
- Encoding structured labeling applications: Markov chains, alignment, etc.
- Max-margin approach to ranking (as against labeling)
- Ordinal regression

~~2005-10-06~~- EMNLP
- 2005-10-10
- Semi-supervised and transductive learning
- The min-cut formulation
- Transductive SVM

- 2005-10-13
- Transductive SVM
- Graph laplacian and its spectral properties
- Spectral graph transducer

- 2005-10-17
- Detour: Bayes-optimal decision vs. max-margin classification (Alekh Agarwal)
- Semisupervised or unsupervised generative models and EM

- 2005-10-20
- EM demo and initialization issues
- The aspect model
- Non-negative matrix factorization

- 2005-10-24
- Iterative scaling, NMF demo
- Dyadic factors of boolean matrices: cross-associations

- 2005-10-27
- Semisupervised conditional learning using graph laplacian
- Cross-associations and co-clustering

- 2005-10-31
- Applications of spectral methods and SVMs to Web analysis (Vivek Tawde, Yahoo! Research)
- Rate-distortion and relation to cross-association

~~2005-11-03~~- Diwali
- 2005-11-07
- Rate-distortion reloaded
- Information bottleneck

- 2005-11-10
- Bias-variance decomposition

- 2005-11-21
- Final exam, 14:30--17:30, A1/A2 Math