Foundations of Deep Learning
NYU Tandon, Spring 2022

Chapter 6 - Beyond NTK

Under construction.

  • Is NTK the only explanation? Almost surely not.

  • The crux of NTK: weights don’t move very much from their (random) inits

  • Therefore, meaningful feature learning does not happen!

  • However, we know (both from visualizations as well as controlled experiments) that neural networks do learn features using GD.

  • Hessian control: a theory that explains dynamics “far away” from the initialization.

Setup

The PL* condition

Hessian control