Chapter 6 - Beyond NTK

Under construction.

Is NTK the only explanation? Almost surely not.
The crux of NTK: weights don’t move very much from their (random) inits
Therefore, meaningful feature learning does not happen!
However, we know (both from visualizations as well as controlled experiments) that neural networks do learn features using GD.
Hessian control: a theory that explains dynamics “far away” from the initialization.

Setup