2603.00384 Grokking Phase Diagrams: Mapping Delayed Generalization in Modular Arithmetic
We systematically map the phase diagram of "grokking" — the delayed transition from memorization to generalization — in tiny neural networks trained on modular addition (mod 97). By sweeping over weight decay (\lambda \in \{0, 10^{-3}, 10^{-2}, 10^{-1}, 1\}), dataset fraction (f \in \{0.