Course Details
Course Code: CS-689
Venue: SIC-301, KReSIT CSE building
Class Timing: Tue 3:30pm-4:55pm, Fri 3:30pm-4:55pm
Office: 306, KReSIT building (Extn: 7903, email: saketh {at} cse)
Office Timing: Wed, Fri 7:30am-9:00am.
# Mailing List: cs689
Scope and Syllabus: pdf
Assignments
The assignment problems are here (Updated: 10-Mar-10), Solutions here (Updated: 08-Mar-10)
Lecture Schedule
Date | Topics | Resources |
---|---|---|
Jan 5 | Introduction to the course | Syllabus |
Jan 8 | Introduction to Statistical Learning Theory (SLT): Definitions of loss function, risk, empirical risk, motivation for Empirical Risk Minimization (ERM) | Further Reading, Supplementary |
Jan 12 | Consistency of ERM, Sufficient condition for ERM as one-sided uniform convergence, Analysis for finite sets of functions and extensions to general case using Symmetrization trick, Shattering Coeff. | Further Reading, Supplementary |
Jan 15 | Shattering coeff., growth function, VC dimension, Annealed Entropy; Nec. and suff. cond. for exp. fast ERM convergence for dist. independent and dependent case; VC-style inequalities, principle of Structural Risk Minimization (SRM) -- application to SVMs | Further Reading |
Jan 19 | Motivation for studying canonical hyperplane classifiers, Statement and proof of theorem regarding VC dim. of such classifiers, (hard-margin) SVM formulation | Further Reading |
Jan 22 | Data-dependent SRM --- related theorem, Derivation of (hard,soft-margin) SVM formulation, Introduction to Lagrange Multiplier and Duality theory | Further Reading, Supplementary |
Jan 29 | Derivation of dual of SVM and its geometrical interpretation, key insights into the SVM solution, motivation for extensions to non-linear classifiers, case of polynomial discriminators and idea of kernel trick | Further Reading |
Feb 2 | Introduction to theory of kernels, definition, relation to inner-products, rough proof of Mercer's theorem, insights into the induced feature representation, notion of hyperplane in feature space, examples of kernels on Euclidean spaces- polynimial, Gaussian kernels | Further Reading |
Feb 5 | Discussion on Mercer's kernels and theorem, Positiveness preserving operations with kernels (sum, scaling, product, normalization, polynomials, exponential), Kernels involving probability spaces (rather than Eucldean spaces), Representer Theorem, derivation of dual of soft-margin SVM for arbitrary kernel, uniqueness of SVM solution | Further Reading, Supplementary |
Feb 9 | Notion of Support Vectors, Recovery of primal variables from dual variables in SVM, Interplay of no. support vectors and computational efficiency --- choice during recovery stage, generalization of Representer Theorem, Alternative loss functions and regularizers, Benefits of $l_1$, $l_\infty$ based regularizers, Choosing model parameters for SVM | Further Reading, Supplementary |
Feb 23 | Revisit to STL --- Bounds based on Rademacher Complexities --- Derivation of McDiarmid's Inequality | Further Reading, Supplementary |
Feb 26 | Derivation of learning bounds involving Rademacher Averages, Example of bounding RA of hyperplane classifiers in RKHS --- trace-margin bound | Further Reading |
Mar 02 | Kernel Learning formulation using trace-margin bound, Multiple Kernel Learning (MKL) formulation, posing MKL as QCLP formulation, insights into nature of solution | Further Reading |
Mar 05 | Motivation for solving large-scale MKL, MKL as SILP solved using Cutting-plane, MKL solved using reduced-gradient, review of first-order methods and Danskin's theorem | Further Reading |
Mar 09 | Application of projected gradient descent and reduced gradient descent for solving MKL --- simulation results, basics of Mirror-Descent | Further Reading |
Mar 12 | Application of Mirror-Descent to MKL --- simulation results, l-p regularized MKL, Primal view of MKL | Further Reading |
Mar 19 | Primal view of MKL, equivalence to l-1 MKL and related generic equivalence result (lambda trick), MKL for multi-modal tasks | Further Reading |
Mar 23 | Discussion on Hierarchical Kernel Learning | Further Reading |
Mar 26 | Introduction and motivation for Multi-Task Learning (MTL), basic formulations (primal and dual view), standard assumptions for common feature-space, brief overview of learning bounds and simple algorithms for this case | Further Reading, Supplementary |
Mar 30 | Multi-task feature learning, Connection with kernel learning, Alternate minimization alg. for solving formulation, Equivalence to Trace-norm minimization problem, Discussion of matrix-norms and analogy to vector case, Nesterov's method for trace-norm minimization | Further Reading |
Apr 6 | Kernelization of the Multi-task feaure learning formulation, Extension of representer theorem, Extensions to other shatten-norms | Further Reading |
Apr 9 | Motivation for alg. for robust learning, principle of robust optimization, SVMs are already robust --- interpretation of l2 regularization, formulations where explicit information reg. noise is known (6 cases), Discussion on bounded and Gaussian noise cases, notion of dual norms etc. | Further Reading |
Apr 13 | One-sided Chebyshev's inequality, formulations without distributional assumptions --- moment based variants, geometric interpretation of formulations, isssue of robustness to moment estimation errors | Further Reading |