Course Details

Course Code: CS-689
Venue: SIC-301, KReSIT CSE building
Class Timing: Tue 3:30pm-4:55pm, Fri 3:30pm-4:55pm
Office: 306, KReSIT building (Extn: 7903, email: saketh {at} cse)
Office Timing: Wed, Fri 7:30am-9:00am.
# Mailing List: cs689
Scope and Syllabus: pdf

Assignments

The assignment problems are here (Updated: 10-Mar-10), Solutions here (Updated: 08-Mar-10)

Lecture Schedule

Date	Topics	Resources
Jan 5	Introduction to the course	Syllabus
Jan 8	Introduction to Statistical Learning Theory (SLT): Definitions of loss function, risk, empirical risk, motivation for Empirical Risk Minimization (ERM)	Further Reading, Supplementary
Jan 12	Consistency of ERM, Sufficient condition for ERM as one-sided uniform convergence, Analysis for finite sets of functions and extensions to general case using Symmetrization trick, Shattering Coeff.	Further Reading, Supplementary
Jan 15	Shattering coeff., growth function, VC dimension, Annealed Entropy; Nec. and suff. cond. for exp. fast ERM convergence for dist. independent and dependent case; VC-style inequalities, principle of Structural Risk Minimization (SRM) -- application to SVMs	Further Reading
Jan 19	Motivation for studying canonical hyperplane classifiers, Statement and proof of theorem regarding VC dim. of such classifiers, (hard-margin) SVM formulation	Further Reading
Jan 22	Data-dependent SRM --- related theorem, Derivation of (hard,soft-margin) SVM formulation, Introduction to Lagrange Multiplier and Duality theory	Further Reading, Supplementary
Jan 29	Derivation of dual of SVM and its geometrical interpretation, key insights into the SVM solution, motivation for extensions to non-linear classifiers, case of polynomial discriminators and idea of kernel trick	Further Reading
Feb 2	Introduction to theory of kernels, definition, relation to inner-products, rough proof of Mercer's theorem, insights into the induced feature representation, notion of hyperplane in feature space, examples of kernels on Euclidean spaces- polynimial, Gaussian kernels	Further Reading
Feb 5	Discussion on Mercer's kernels and theorem, Positiveness preserving operations with kernels (sum, scaling, product, normalization, polynomials, exponential), Kernels involving probability spaces (rather than Eucldean spaces), Representer Theorem, derivation of dual of soft-margin SVM for arbitrary kernel, uniqueness of SVM solution	Further Reading, Supplementary
Feb 9	Notion of Support Vectors, Recovery of primal variables from dual variables in SVM, Interplay of no. support vectors and computational efficiency --- choice during recovery stage, generalization of Representer Theorem, Alternative loss functions and regularizers, Benefits of $l_1$, $l_\infty$ based regularizers, Choosing model parameters for SVM	Further Reading, Supplementary
Feb 23	Revisit to STL --- Bounds based on Rademacher Complexities --- Derivation of McDiarmid's Inequality	Further Reading, Supplementary
Feb 26	Derivation of learning bounds involving Rademacher Averages, Example of bounding RA of hyperplane classifiers in RKHS --- trace-margin bound	Further Reading
Mar 02	Kernel Learning formulation using trace-margin bound, Multiple Kernel Learning (MKL) formulation, posing MKL as QCLP formulation, insights into nature of solution	Further Reading
Mar 05	Motivation for solving large-scale MKL, MKL as SILP solved using Cutting-plane, MKL solved using reduced-gradient, review of first-order methods and Danskin's theorem	Further Reading
Mar 09	Application of projected gradient descent and reduced gradient descent for solving MKL --- simulation results, basics of Mirror-Descent	Further Reading
Mar 12	Application of Mirror-Descent to MKL --- simulation results, l-p regularized MKL, Primal view of MKL	Further Reading
Mar 19	Primal view of MKL, equivalence to l-1 MKL and related generic equivalence result (lambda trick), MKL for multi-modal tasks	Further Reading
Mar 23	Discussion on Hierarchical Kernel Learning	Further Reading
Mar 26	Introduction and motivation for Multi-Task Learning (MTL), basic formulations (primal and dual view), standard assumptions for common feature-space, brief overview of learning bounds and simple algorithms for this case	Further Reading, Supplementary
Mar 30	Multi-task feature learning, Connection with kernel learning, Alternate minimization alg. for solving formulation, Equivalence to Trace-norm minimization problem, Discussion of matrix-norms and analogy to vector case, Nesterov's method for trace-norm minimization	Further Reading
Apr 6	Kernelization of the Multi-task feaure learning formulation, Extension of representer theorem, Extensions to other shatten-norms	Further Reading
Apr 9	Motivation for alg. for robust learning, principle of robust optimization, SVMs are already robust --- interpretation of l2 regularization, formulations where explicit information reg. noise is known (6 cases), Discussion on bounded and Gaussian noise cases, notion of dual norms etc.	Further Reading
Apr 13	One-sided Chebyshev's inequality, formulations without distributional assumptions --- moment based variants, geometric interpretation of formulations, isssue of robustness to moment estimation errors	Further Reading

TIML-09

Course Details

Assignments

Lecture Schedule