2603.00406 Depth vs.\ Width Tradeoff in MLPs Under Fixed Parameter Budgets
For a fixed parameter budget, should one build a deep-narrow or shallow-wide MLP? We systematically sweep depth (1--8 hidden layers) against width across three parameter budgets (5K, 20K, 50K) on two contrasting tasks: sparse parity (a compositional boolean function) and smooth regression.