Random Matrix Theory Analysis of Trained Neural Network Weights: Marchenko-Pastur Deviations as a Measure of Learned Structure
Introduction
Understanding what neural networks learn during training remains a central question in deep learning theory. Random Matrix Theory provides a principled null model: if a weight matrix were purely random (i.i.d.\ entries), the eigenvalue distribution of its correlation matrix would follow the Marchenko-Pastur (MP) law [marchenko1967distribution]. Deviations from MP therefore indicate that training has imposed structure on the weights beyond what random initialization provides.
Martin and Mahoney [martin2021implicit] applied this framework to production-quality deep neural networks, discovering that the empirical spectral density (ESD) of weight matrices displays signatures of self-regularization. They identified phases of training characterized by increasingly heavy-tailed eigenvalue distributions. Recent work has extended these ideas to locate learned information in large language models by identifying eigenvalues and eigenvectors that deviate from RMT predictions [he2024rmt].
We contribute an agent-executable skill that applies RMT analysis to tiny MLPs trained on two synthetic tasks: modular arithmetic (mod 97), which requires learning structured algebraic relationships, and polynomial regression. By comparing trained and untrained weight spectra across varying network widths, we test whether MP deviation metrics capture meaningful differences in learned structure.
Background
Marchenko-Pastur Distribution
For an random matrix with i.i.d.\ entries of mean 0 and variance , the eigenvalue density of converges (as with fixed) to: with support on where .
Eigenvalues outside this "bulk" region represent signal—structure that cannot be explained by random fluctuations alone.
Deviation Metrics
We measure deviation from MP using four complementary metrics:
- **KS statistic:** Maximum distance between empirical and theoretical CDFs. Higher values indicate greater deviation from randomness.
- **Outlier fraction:** Proportion of eigenvalues outside <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">[</mo><msub><mi>λ</mi><mo>−</mo></msub><mo separator="true">,</mo><msub><mi>λ</mi><mo>+</mo></msub><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">[\lambda_-, \lambda_+]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord"><span class="mord mathnormal">λ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2583em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mbin mtight">−</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">λ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2583em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mbin mtight">+</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span><span class="mclose">]</span></span></span></span>.
- **Spectral norm ratio:** <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>λ</mi><mi>max</mi><mo></mo></msub><mi mathvariant="normal">/</mi><msub><mi>λ</mi><mo>+</mo></msub></mrow><annotation encoding="application/x-tex">\lambda_{\max} / \lambda_+</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal">λ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop mtight"><span class="mtight">m</span><span class="mtight">a</span><span class="mtight">x</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">/</span><span class="mord"><span class="mord mathnormal">λ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2583em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mbin mtight">+</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em;"><span></span></span></span></span></span></span></span></span></span>, where values <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>></mo><mn>1</mn></mrow><annotation encoding="application/x-tex">> 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mrel">></span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1</span></span></span></span> indicate signal spikes beyond the random bulk.
- **KL divergence:** Binned approximation of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>D</mi><mrow><mi>K</mi><mi>L</mi></mrow></msub><mo stretchy="false">(</mo><msub><mi>P</mi><mtext>emp</mtext></msub><mi mathvariant="normal">∥</mi><msub><mi>P</mi><mtext>MP</mtext></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">D_{KL}(P_{\text{emp}} \| P_{\text{MP}})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0278em;">D</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0715em;">K</span><span class="mord mathnormal mtight">L</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em;">P</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">emp</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mord">∥</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em;">P</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">MP</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>.Methodology
Tasks
Modular Addition (mod 97): Inputs are pairs with , one-hot encoded to . The target is , a 97-way classification problem. This task requires learning modular arithmetic structure and is known to exhibit "grokking" [power2022grokking].
Polynomial Regression: Input with polynomial features . Target: .
Models and Training
We use 3-layer MLPs (Linear-ReLU-Linear-ReLU-Linear) with hidden dimensions . All models are trained with Adam () for 500 epochs with batch size 512 and seed 42. For each configuration, we save both the trained weights and a copy of the randomly initialized weights as a baseline.
Spectral Analysis
For each weight matrix of shape :
- If <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi><mo><</mo><mi>N</mi></mrow><annotation encoding="application/x-tex">M < N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7224em;vertical-align:-0.0391em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">M</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel"><</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">N</span></span></span></span>, transpose to ensure <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi><mo>≥</mo><mi>N</mi></mrow><annotation encoding="application/x-tex">M \geq N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8193em;vertical-align:-0.136em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">M</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≥</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">N</span></span></span></span>
- Compute <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mo>=</mo><mfrac><mn>1</mn><mi>M</mi></mfrac><msup><mi>W</mi><mi mathvariant="normal">⊤</mi></msup><mi>W</mi></mrow><annotation encoding="application/x-tex">C = \frac{1}{M} W^\top W</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.1941em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8451em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.109em;">M</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em;">W</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.1389em;">W</span></span></span></span> (size <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi><mo>×</mo><mi>N</mi></mrow><annotation encoding="application/x-tex">N \times N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">N</span></span></span></span>)
- Compute eigenvalues of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi></mrow><annotation encoding="application/x-tex">C</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0715em;">C</span></span></span></span>
- Estimate <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>σ</mi><mn>2</mn></msup><mo>=</mo><mtext>Var</mtext><mo stretchy="false">(</mo><msub><mi>W</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\sigma^2 = \text{Var}(W_{ij})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord text"><span class="mord">Var</span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em;">ij</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>γ</mi><mo>=</mo><mi>N</mi><mi mathvariant="normal">/</mi><mi>M</mi></mrow><annotation encoding="application/x-tex">\gamma = N/M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.0556em;">γ</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">N</span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.109em;">M</span></span></span></span>
- Compute MP bounds and all deviation metricsFor trained-vs-untrained comparisons, we compute paired layer-wise deltas {\mathrm{trained}} - \mathrm{KS}{\mathrm{untrained}}. We summarize these deltas with (i) a one-sided sign test () and (ii) a bootstrap 95% confidence interval for mean (2000 resamples, seed 42).
Results
We analyze 48 weight matrices total: 8 models 3 layers 2 conditions (trained/untrained).
Trained vs.\ Untrained
Across all configurations, the mean KS increase after training is positive at the model level. At the layer level, 20 of 24 trained layers show higher KS statistics than their untrained counterparts, indicating that training typically moves weight spectra away from MP predictions. The four non-increasing cases are the regression output layers with a single eigenvalue, where MP comparisons collapse to a degenerate KS value of zero. The paired sign test is strongly significant (one-sided on non-tied pairs), and the 95% bootstrap CI for mean is .
Untrained networks show KS statistics consistent with finite-size deviations from MP (small matrices deviate more due to limited sample size), while trained networks show systematically larger deviations that increase with training effectiveness.
Layer-wise Structure
Later layers (fc2, fc3) tend to show stronger deviations from MP than the first layer (fc1), particularly for the modular addition task. This aligns with the interpretation that later layers encode more task-specific transformations.
Width Effects
Wider networks (hidden dim 256) show better MP fits for untrained weights, consistent with the theoretical prediction that finite-size effects decrease with matrix dimension. For trained weights, the relationship between width and deviation depends on the task and training dynamics.
Task Comparison
Modular addition, which requires learning structured algebraic relationships, generally produces stronger spectral deviations than polynomial regression. This suggests that the complexity of the learned function is reflected in the weight spectrum.
Discussion
Limitations. Our models are intentionally tiny (32--256 hidden units), far smaller than production DNNs. The MP distribution is an asymptotic result (), so finite-size effects are significant for our smallest models. The 500-epoch training budget may not be sufficient for full grokking on modular arithmetic, which can require thousands of epochs. We report a single deterministic seed, so the current submission does not estimate across-seed variability in the deviation metrics.
Connection to prior work. Martin and Mahoney [martin2021implicit] observe heavy-tailed distributions in large pre-trained models. Our tiny models operate in a different regime where MP deviations are more subtle, but the same principle applies: deviation from RMT predictions indicates learned structure.
Reproducibility.
This analysis is fully agent-executable via the accompanying SKILL.md.
All random seeds are pinned, all dependencies are version-locked, and the pipeline emits a SHA256 manifest (checksums.sha256) so generated artifacts can be verified byte-for-byte across reruns.
The default CPU workflow completes in under 3 minutes, and alternate experiment settings can be specified via documented CLI flags.
Conclusion
We demonstrate that Random Matrix Theory provides a useful lens for understanding learned structure in neural network weights. The Marchenko-Pastur distribution serves as a principled null model: trained weights usually deviate more from it than matched random initializations, while randomly initialized baselines mostly follow MP except in very low-dimensional layers. The degree of deviation correlates with task complexity and layer depth. Our contribution is primarily methodological: a reproducible, agent-executable pipeline that makes RMT analysis accessible for any AI agent to run independently.
References
[marchenko1967distribution] V. A. Marchenko and L. A. Pastur, "Distribution of eigenvalues for some sets of random matrices," Matematicheskii Sbornik, vol. 114, no. 4, pp. 507--536, 1967.
[martin2021implicit] C. H. Martin and M. W. Mahoney, "Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning," JMLR, vol. 22, no. 165, pp. 1--73, 2021.
[power2022grokking] A. Power, Y. Burda, H. Edwards, I. Babuschkin, and V. Misra, "Grokking: Generalization beyond overfitting on small algorithmic datasets," arXiv preprint arXiv:2201.02177, 2022.
[he2024rmt] Y. He et al., "Locating information in large language models via random matrix theory," arXiv preprint arXiv:2410.17770, 2024.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: rmt-weight-analysis description: Analyze eigenvalue spectra of trained MLP weight matrices against the Marchenko-Pastur distribution from Random Matrix Theory. Trains tiny MLPs on modular arithmetic (mod 97) and polynomial regression, then measures how trained weights deviate from random predictions using KS statistics, outlier fractions, and spectral norm ratios. allowed-tools: Bash(python *), Bash(python3 *), Bash(pip *), Bash(.venv/*), Bash(cat *), Read, Write --- # Random Matrix Theory Analysis of Neural Network Weights This skill trains tiny MLPs on synthetic tasks and analyzes their weight matrix eigenvalue spectra using Random Matrix Theory (RMT). It compares empirical spectra to the Marchenko-Pastur distribution to quantify how much structure each layer has learned. ## Prerequisites - Requires **Python 3.10+**. No internet access needed (all data is synthetic). - Expected runtime: **1-3 minutes** on CPU. - All commands must be run from the **submission directory** (`submissions/rmt/`). ## Step 1: Environment Setup Create a virtual environment and install dependencies: ```bash python3 -m venv .venv .venv/bin/python -m pip install --upgrade pip .venv/bin/python -m pip install -r requirements.txt ``` Verify installation by running the test suite (Step 2), which will catch any missing dependencies. ## Step 2: Run Unit Tests Verify the analysis modules work correctly: ```bash .venv/bin/python -m pytest tests/ -v ``` Expected: Pytest exits with all tests passed and exit code 0. ## Step 3: Run the Analysis Execute the full RMT analysis pipeline: ```bash .venv/bin/python run.py ``` Optional flags for extensions without code edits: ```bash .venv/bin/python run.py --hidden-dims 64,128 --mod-epochs 300 --reg-epochs 300 --output-dir results_alt ``` Expected: Script prints progress through 5 stages and exits with code 0. Files created in `results/`: - `results.json` — raw metrics for all models and layers - `report.md` — human-readable summary with tables - `eigenvalue_spectra.png` — eigenvalue histograms vs MP overlay (trained) - `eigenvalue_spectra_untrained.png` — eigenvalue histograms vs MP overlay (untrained) - `ks_summary.png` — KS statistics, outlier fractions, and spectral norm ratios - `checksums.sha256` — SHA256 manifest for deterministic artifact verification This will: 1. Generate modular addition (mod 97) and polynomial regression datasets 2. Train 8 tiny MLPs (4 hidden dims x 2 tasks) with seed=42 3. Extract weight matrices from each layer (3 per model) 4. Compute eigenvalue spectra of correlation matrices W^T W / M 5. Compare to Marchenko-Pastur theoretical predictions 6. Measure KS statistic, outlier fraction, spectral norm ratio, KL divergence 7. Generate comparison plots and summary report ## Step 4: Validate Results Check that results are complete and scientifically valid: ```bash .venv/bin/python validate.py ``` Expected: Prints metric summaries and `Validation passed.` The validator checks: - All 8 models trained successfully - All 24 layer analyses (8 models x 3 layers) completed - Metrics in valid ranges (KS in [0,1], outlier fraction in [0,1]) - Core hypothesis holds: trained models deviate more from MP than untrained - Paired delta-KS summary is internally consistent (recomputed sign test + bootstrap CI) - `checksums.sha256` matches all generated artifacts ## Step 5: Review the Report Read the generated report: ```bash cat results/report.md ``` The report contains: - Training summary (loss, accuracy/MSE per model) - RMT analysis table for trained models (KS, outlier fraction, spectral norm ratio, KL divergence) - RMT analysis table for untrained baselines - Trained vs untrained comparison with delta KS - Statistical confidence section (one-sided sign test p-value and 95% bootstrap CI for mean delta KS) - Key findings ## Step 6: (Optional) Determinism Check Run the default pipeline twice and compare output hashes: ```bash .venv/bin/python run.py --quiet shasum -a 256 results/results.json results/report.md results/checksums.sha256 > results/hash_run1.txt .venv/bin/python run.py --quiet shasum -a 256 results/results.json results/report.md results/checksums.sha256 > results/hash_run2.txt diff results/hash_run1.txt results/hash_run2.txt ``` Expected: no diff output. ## How to Extend - **Change experiment config from CLI:** use `run.py` flags like `--seed`, `--hidden-dims`, `--modulus`, `--reg-samples`, `--mod-epochs`, `--reg-epochs`, `--learning-rate`, `--batch-size`, `--output-dir`, `--quiet`. - **Add a task:** Create a new data generator in `src/data.py` and add it to the training loop in `run.py`. - **Change network architecture:** Modify `TinyMLP` in `src/model.py` (e.g., add layers, change activation). - **Add RMT metrics:** Extend `analyze_weight_matrix()` in `src/rmt_analysis.py`. - **Test on pre-trained models:** Load weights from a saved checkpoint and pass to `analyze_model_weights()`.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.