- Instructor: Adam Oberman
- Fall 2021

A mathematically rigorous approach to machine learning.

Math and Stats Majors/Honours students, CS students.

- Math 223/Math 248 or equivalent (linear algebra majors/honours)
- Math 222 (Calculus 3)
- Math 324/Math 357 (Statistics)
- Probability course (implied, since it is a prerequisite for 324/357)
- Math 358 Honours Vector Calculus/Math 314 Advanced Calculus

- Math 562, Winter 2022
- This is a sequel to Math 462. There will be a small amount of repetition of topics, since some students will not have taken 462. However, to avoid repeated material, students from 462 will work on their project and be excused from HW problems which overlap.

- COMP 551 Applied Machine Learning https://www.siamak.page/courses/COMP551F21/index.html
- This course focuses on implementation, rather than theory. This a complementary course.

- COMP 451 Fundamentals of Machine Learning https://cs.mcgill.ca/~wlh/comp451/
- CS theory course, not currently offered. Math 452 and Comp 451 are mutually exclusive.

- Math 308 Fundamentals of Statistical Learning
- not offered this year.

- Mathematics for Machine Learning by Diesenroth This book is elementary, but can be used as a reference or review of topics from the prerequisites
- Probabilistic Machine Learning: An Introduction by Kevin Patrick Murphy This book is encyclopedic, covers many topics, good reference, but not presented as digestible lectures
- Understanding Machine Learning: From Theory to Algorithms by Shalev-Shwartz and Ben David This book is very good for presenting machine learning problems, but less detailed on the proofs and part 3.
- Foundations of Machine Learning by Mohri, Rostamizadeh, Talwalkar. This book contains rigorous proofs of generalization bounds, but assumes the reader is already familiar with the problems.
- High dimensional statistics, a non-asymptotic viewpoint by Martin J Wainwright.

- 5 HW assignments : 25%
- Group Project and Presentation: 15%
- Attendance 5%, Participation 5%.
- 2 Midterm Exams : 20%
- Final exam : 30%
*Soft grading policy:*you are encouraged to make your best effort to complete all the work. However, if you need to miss anything (assignment or exam), I will institute a soft grading policy which will allow one missed assignment and one missed midterm exam, with a small penalty. Your final grade will be given by your average on the other work, with a penalty of:- 1% (for each assignment missed),
- 2% (for a missed midterm).
- 3% (for a missed final exam)

E.g. if you missed one assignment and one midterm, with an average of 87% on the rest, then the penalty would be 87-(1+2) = 84%.

Refer to McGill key Dates

- Classes begin: Weds Sept 1.
- Fall reading break: Tues-Weds Oct. 12-13
- Makeup day: Oct 15, no class.
- Last class: Friday Dec 3rd.
- Midterm dates: TBD

- Course notes Part 1 (intro ML)
- UCourse note Part 2 (regression and binary classification)
- Course notes part 3 (convex analysis and optimization)
- Course notes part 4 (multiclass)
- Course Notes (RL) HW version

- Homework 1 due noon, Monday Sept 20th Homework 1 Revised and Homework 1 Solution
- Homework 2, due 5pm, Tuesday Oct 5th, Homework 2, and HW 2 Solution
- Homework 3, due 5pm, Tuesday Oct 19th, Homework 3 and HW 3 Solution
- Homework 4, due 5pm, Tuesday Nov 16th, HW_4.pdf
- Homework 5, due 5pm, Monday Dec 6th (It's short!) HW_5

- Reference Sutton Barto texbook. Refer to Part I: tabular RL (selected). Part II, 9.1 and Ch 13 policy gradient method. Ch 16 Case studies.
- Lecture 18
- Lecture 19
- Lecture 20 - no notes, group work on projects.
- Lecture 21 19.11.2021.Lecture21.pdf

- (Weds) Project Outline and Examples (see links above)
- (Friday) Lecture 15 Deep Neural networks.
- Additional reference for CNNs: https://www.deeplearningbook.org/ Chapter 6 and Chapter 9
- (Weds) Face Verification Problem. Generalization: in distribution and out of distribution Lecture 16
- (Friday) Unsupervised: cluster energy. Semi-supervised SVM and margin. Reinformence learning warmup. Lecture 17

- Reference: Shalev-Shwartz Ch 17, Mohri Ch 9
- Lecture 13
- Lecture 14

- Reference: Understanding Maching Learning, Shalev-Shwartz and Ben David, Chapters 12 and 14.
- Reference: Boyd Convex Optimization, Chapters 2 and 3 and 9
- Class notes: Lecture 9 Lecture 11 Lecture 12

- Loss design for classification: zero-one loss and hinge loss.

- Read the notes Working version of course notes Part 2 and discuss in class.

- Introduction: example problems and datasets, meet and greet
- Reference: Murphy Ch1

- Set up the supervised learning problem for regression

- Regression, other losses, compare losses
- Calculus to find the minimizer of the (ELM) problem
- Reference: Course notes

- Review of calculus and vector calculus, chain rule, gradient
- Compute gradient of the (EL) expected loss function
- Quadratic regression case
- References: Course Notes, Section "Gradients and Minimizers"

- Instead of a lecture we will have a problem session, to work on HW1

- (No recording for lectures 1 and 2)
- Lecture 3
- Lecture 4 and future lectures are only available in mycourses (due to McGill zoom recording policy).