Course Details

Course Code: CS-689
Venue: SIC-301, KReSIT CSE building
Class Timing: Tue 3:30pm-4:55pm, Fri 3:30pm-4:55pm
Office: 306, KReSIT building (Extn: 7903, email: saketh {at} cse)
Office Timing: Wed, Fri 7:30am-9:00am.
# Mailing List: cs689
Scope and Syllabus: pdf

Assignments

The assignment problems are here (Updated: 10-Mar-10), Solutions here (Updated: 08-Mar-10)

Lecture Schedule

Date Topics Resources
Jan 5 Introduction to the course Syllabus
Jan 8 Introduction to Statistical Learning Theory (SLT): Definitions of loss function, risk, empirical risk, motivation for Empirical Risk Minimization (ERM) Further Reading, Supplementary
Jan 12 Consistency of ERM, Sufficient condition for ERM as one-sided uniform convergence, Analysis for finite sets of functions and extensions to general case using Symmetrization trick, Shattering Coeff. Further Reading, Supplementary
Jan 15Shattering coeff., growth function, VC dimension, Annealed Entropy; Nec. and suff. cond. for exp. fast ERM convergence for dist. independent and dependent case; VC-style inequalities, principle of Structural Risk Minimization (SRM) -- application to SVMs Further Reading
Jan 19Motivation for studying canonical hyperplane classifiers, Statement and proof of theorem regarding VC dim. of such classifiers, (hard-margin) SVM formulation Further Reading
Jan 22Data-dependent SRM --- related theorem, Derivation of (hard,soft-margin) SVM formulation, Introduction to Lagrange Multiplier and Duality theory Further Reading, Supplementary
Jan 29Derivation of dual of SVM and its geometrical interpretation, key insights into the SVM solution, motivation for extensions to non-linear classifiers, case of polynomial discriminators and idea of kernel trick Further Reading
Feb 2Introduction to theory of kernels, definition, relation to inner-products, rough proof of Mercer's theorem, insights into the induced feature representation, notion of hyperplane in feature space, examples of kernels on Euclidean spaces- polynimial, Gaussian kernelsFurther Reading
Feb 5Discussion on Mercer's kernels and theorem, Positiveness preserving operations with kernels (sum, scaling, product, normalization, polynomials, exponential), Kernels involving probability spaces (rather than Eucldean spaces), Representer Theorem, derivation of dual of soft-margin SVM for arbitrary kernel, uniqueness of SVM solutionFurther Reading, Supplementary
Feb 9Notion of Support Vectors, Recovery of primal variables from dual variables in SVM, Interplay of no. support vectors and computational efficiency --- choice during recovery stage, generalization of Representer Theorem, Alternative loss functions and regularizers, Benefits of $l_1$, $l_\infty$ based regularizers, Choosing model parameters for SVMFurther Reading, Supplementary
Feb 23Revisit to STL --- Bounds based on Rademacher Complexities --- Derivation of McDiarmid's InequalityFurther Reading, Supplementary
Feb 26Derivation of learning bounds involving Rademacher Averages, Example of bounding RA of hyperplane classifiers in RKHS --- trace-margin boundFurther Reading
Mar 02Kernel Learning formulation using trace-margin bound, Multiple Kernel Learning (MKL) formulation, posing MKL as QCLP formulation, insights into nature of solutionFurther Reading
Mar 05Motivation for solving large-scale MKL, MKL as SILP solved using Cutting-plane, MKL solved using reduced-gradient, review of first-order methods and Danskin's theoremFurther Reading
Mar 09Application of projected gradient descent and reduced gradient descent for solving MKL --- simulation results, basics of Mirror-DescentFurther Reading
Mar 12Application of Mirror-Descent to MKL --- simulation results, l-p regularized MKL, Primal view of MKLFurther Reading
Mar 19Primal view of MKL, equivalence to l-1 MKL and related generic equivalence result (lambda trick), MKL for multi-modal tasksFurther Reading
Mar 23Discussion on Hierarchical Kernel LearningFurther Reading
Mar 26Introduction and motivation for Multi-Task Learning (MTL), basic formulations (primal and dual view), standard assumptions for common feature-space, brief overview of learning bounds and simple algorithms for this caseFurther Reading, Supplementary
Mar 30Multi-task feature learning, Connection with kernel learning, Alternate minimization alg. for solving formulation, Equivalence to Trace-norm minimization problem, Discussion of matrix-norms and analogy to vector case, Nesterov's method for trace-norm minimizationFurther Reading
Apr 6Kernelization of the Multi-task feaure learning formulation, Extension of representer theorem, Extensions to other shatten-normsFurther Reading
Apr 9Motivation for alg. for robust learning, principle of robust optimization, SVMs are already robust --- interpretation of l2 regularization, formulations where explicit information reg. noise is known (6 cases), Discussion on bounded and Gaussian noise cases, notion of dual norms etc.Further Reading
Apr 13One-sided Chebyshev's inequality, formulations without distributional assumptions --- moment based variants, geometric interpretation of formulations, isssue of robustness to moment estimation errorsFurther Reading