Math 605 Theory of Machine Learning. Spring 2022.

  • Instructor: Xingye Qiao

  • Email:

  • Office: WH-134

  • Meeting time & location: TR 10:10 to 11:50 am at WH-100E .

  • Office hours: Friday 10 to 11 am excluding holidays. I should usually be there but you are recommended to email me to confirm just in case.

This course is a 4-credit course, which means that in addition to the scheduled lectures/discussions, students are expected to do at least 9.5 hours of course-related work each week during the semester. This includes things like: completing assigned readings, participating in lab sessions, studying for tests and examinations, preparing written assignments, completing internship or clinical placement requirements, and other tasks that must be completed to earn credit in the course.


Basic prerequisites include probability theory, statistics, real analysis, and linear algebra. This is a theory class: although many tools will be reviewed in lectures, a strong mathematical background is necessary.


Statistical learning theory is a flourishing research field at the intersection of probability, statistics, computer science, and optimization that studies the performance of algorithms for making predictions based on training data. This course will provide an introduction to the theoretical analysis of prediction methods, focusing on statistical and computational aspects. The following topics will be covered: basics of statistical decision theory; concentration inequalities; supervised and unsupervised learning; empirical risk minimization; complexity-regularized estimation; generalization bounds for learning algorithms; VC dimension and Rademacher complexities; minimax lower bounds; online learning and optimization. It will focus on tools for the theoretical analysis of the performance of learning algorithms and the inherent difficulty of learning problems.


  1. The Statistical Theory of Machine Learning.

    1. Classification, Regression, Aggregation

    2. Empirical Risk Minimization, Regularization

    3. Concentration inequalities, Uniform convergence

    4. Suprema of Empirical Processes

  2. Algorithms and Convexity.

    1. Boosting

    2. Kernel Methods

    3. Convex Optimization

  3. Kernel methods

    1. Reproducing kernel Hilbert spaces, Mercer's theorem

    2. Convex optimization for kernel methods

    3. Representer theorem

  4. Online Learning (if time permitting).

    1. Online Convex Optimization

    2. Partial Information: Bandit Problems

    3. Blackwell's Approachability

  5. Neural networks (if time permitting).

    1. Stochastic gradient methods

    2. Combinatorial dimensions and Rademacher averages

    3. Hardness results for learning

    4. Efficient learning algorithms


There will be no designated text. I will develop a set of notes drawing contents from those written by Philippe Rigollet (MIT), Percy Liang (Stanford), Peter Bartlett (UC Berkeley) and Martin Wainwright (UC Berkeley). Additional reading materials will be updated on this webpage.


  • Assignments: 40%. There will be about 3-5 homework assignments.

  • Final Project: 50%. The final project should be in any area related to one of the topics of the course or use tools that are developed in class. Examples include: implementing an algorithm for real data, extend an existing method or prove a theoretical result (or a combination of these). You will need to submit a written report ( 10 pages) and give a presentation in class in the last week of semester or earlier (the duration of the talk will depend on the size of the class).

  • Lecture Notes Scribing: 10%. Everyone will contribute to developing the lecture notes for this class.


The students of this class from Spring 2021 have helped to scribed the notes which have formed an online book: Online Lecture Notes