The impact of deep neural networks in numerous application areas of science, engineering, and technology has never been higher than right now.
Still, progress in practical applications of deep learning has considerably outpaced our understanding of its foundations. Many fundamental questions remain unanswered. Why are we able to train neural networks so efficiently? Why do they perform so well on unseen data? Is there any benefit of one network architecture over another?
These lecture notes are an attempt to sample a growing body of work in theoretical machine learning research that address some of these questions. They supplement a graduate level course taught by me in the Spring of 2022.
All pages are under construction. Corrections, pointers to omitted results, and other feedback are welcome: just email me, or open a Github pull request at this repository.
Table of contents
- Chapter 1 - Memorization
- Chapter 2 - Universal approximators
- Chapter 3 - The role of depth
- Chapter 4 - A primer on optimization
- Chapter 5 - Optimizing wide networks
- Chapter 6 - Beyond NTK
- Chapter 7 - Implicit regularization
- Chapter 8 - PAC learning primer and error bounds
- Chapter 9 - Generalization bounds via stability