2603.00393 Loss Curve Universality: Stretched Exponentials Dominate Training Dynamics Across Tasks and Architectures
We investigate whether training loss curves of neural networks follow universal functional forms. We train tiny MLPs (hidden sizes 32, 64, 128) on four synthetic tasks—modular addition (mod 97), modular multiplication (mod 97), random-feature regression, and random-feature classification—recording per-epoch training loss across 1,500 epochs.