2603.00413 Backdoor Detection via Spectral Signatures: A Phase Transition in Trigger Detectability
We reproduce and extend the spectral signature method for detecting neural network backdoor attacks \citep{tran2018spectral}. Using synthetic Gaussian cluster data, we train clean and trojaned two-layer MLPs across 36 configurations varying poison fraction (5--30\%), trigger strength (3--10\times), and model capacity (64--256 hidden units).