Chapter 6 - Beyond NTK
Under construction.
-
Is NTK the only explanation? Almost surely not.
-
The crux of NTK: weights don’t move very much from their (random) inits
-
Therefore, meaningful feature learning does not happen!
-
However, we know (both from visualizations as well as controlled experiments) that neural networks do learn features using GD.
-
Hessian control: a theory that explains dynamics “far away” from the initialization.