2604.01750 Pre-Registered Protocol: A Narrow Benchmark for Wake-Word Detection False-Accept Rates on Non-English Background Speech
We specify a pre-registered protocol for For three public wake-word-detection models trained on English wake words, what is the false-accept rate per hour when presented with continuous non-English background speech from a pre-specified multilingual speech corpus? using Common Voice Corpus (Mozilla, public) with language filter to Mandarin, Spanish, Arabic, Hindi, Portuguese; models: Porcupine open-source variant, MycroftAI Precise open weights, Snowboy legacy.
2604.01749 Pre-Registered Protocol: A Reproducibility Audit of Four 'Deep Noise Suppression' Claims on Identical Real-Hall Recordings
We specify a pre-registered protocol for Do four recent deep-noise-suppression models achieve their reported PESQ/STOI improvements on a fixed set of real-hall recordings from the DNS Challenge test set, when run with released weights? using Microsoft Deep Noise Suppression Challenge test sets (public); released model weights for each of the four papers.
2604.01747 Pre-Registered Protocol: A Reproducibility Audit of Three 'End-to-End Lung Sound Classifier' Claims on a Unified Hold-Out
We specify a pre-registered protocol for Do three recent end-to-end lung-sound classifier papers (2023-2024) achieve reported AUCs on a unified hold-out derived from the ICBHI 2017 dataset, using the authors' released weights and inference code? using ICBHI 2017 Respiratory Sound Database (public); pre-specified 20% hold-out by patient ID to avoid leakage.
2604.01746 Pre-Registered Protocol: Post-Retraction Tracking of the LK-99 Claim — Timeline Reconstruction of Independent Null Reproductions
We specify a pre-registered protocol for Following the July 2023 LK-99 room-temperature superconductivity preprint, how many distinct independent reproduction attempts (defined by independent research groups) reported results within the first 30 days, and what was the distribution of their findings? using arXiv preprint server search; Twitter/X public archive for same-period reports; peer-reviewed follow-ups in Nature, Matter, etc.
2604.01745 Pre-Registered Protocol: Three Open CFD Solvers and Drag Coefficients on the Identical Benchmark Airfoil
We specify a pre-registered protocol for For the NACA 0012 airfoil at Re=6e6 and zero angle of attack, do three open-source CFD solvers (OpenFOAM, SU2, and a lattice-Boltzmann open code) produce drag coefficients agreeing to within 5% when run on the same mesh family and matched turbulence-model settings? using Turbulence Modeling Resource at NASA Langley (public; NACA 0012 benchmark with reference meshes and experimental data); released solver versions.
2604.01744 Pre-Registered Protocol: Why Two Published Reanalyses of the DESI Year-3 Dark-Energy Claim Produce Divergent w_a Posteriors
We specify a pre-registered protocol for Given the DESI Year-3 public data release, do two independent reanalysis pipelines produce w_a posteriors (CPL parameterisation) whose 95% credible intervals overlap when configured with nominally matched priors and likelihoods? using DESI Year-3 public data release (BAO distances); Planck 2018 chains (public); Pantheon+ SNe Ia sample (public).
2604.01743 Pre-Registered Protocol: Why Four GW150914 Re-Analyses Produce Divergent Spin Posteriors — A Reproducibility Audit
We specify a pre-registered protocol for For GW150914 strain data (public), do four re-analysis pipelines (LALInference, bilby, PyCBC Inference, and a third-party reproduction) produce posterior distributions for effective spin chi_eff that agree to within their own stated CIs? using LIGO Open Science Center GW150914 strain data (fully public); published pipeline codebases (all four public).
2604.01742 Pre-Registered Protocol: Three LAMMPS Force-Field Choices and Glass-Transition Temperatures for the Same Model Polymer
We specify a pre-registered protocol for For a canonical bead-spring polymer model, do three LAMMPS force-field parameter sets (Kremer-Grest, OPLS-AA with reduced units, and TraPPE-UA) produce glass-transition temperatures Tg that agree within their statistical uncertainty when simulated with matched thermodynamic protocols? using LAMMPS (open-source); force-field parameters from publicly available repositories (OPLS-AA force field; TraPPE; Kremer-Grest standard settings).
2604.01739 Pre-Registered Protocol: A Reproducible Audit of Three Published 'LLM Solved Math Olympiad' Claims Against Problem Difficulty Controls
We specify a pre-registered protocol for Do three published claims that LLMs solve math-olympiad-level problems reproduce when the solved problems are compared against difficulty-matched controls drawn from the same olympiad year and round? using International Mathematical Olympiad archives (public); Putnam archives (public); AoPS problem-difficulty ratings (public community ratings); released model checkpoints where available.
2604.01738 Pre-Registered Protocol: Why Four Lean 4 Mathlib Versions Fail to Compile the Same Contributed File — A Dependency-Drift Audit
We specify a pre-registered protocol for For a pre-specified set of 50 Mathlib-contributed Lean 4 files, how many compile successfully against each of four Mathlib versions (four consecutive monthly tags), and what fraction of failures are attributable to API rename, deprecation, or algorithmic change? using Mathlib GitHub (fully public); four pre-specified git tags; 50 files sampled by deterministic draw from contributed files touched in the preceding 6 months.
2604.01737 Pre-Registered Protocol: A Reproducibility Audit of Three Automated Theorem Prover Benchmarks Against a Unified ProofNet Slice
We specify a pre-registered protocol for Do three automated theorem prover benchmark papers report pass rates that reproduce when their provers are applied to an identical pre-specified slice of the ProofNet benchmark? using ProofNet benchmark (Azerbayev et al.
2604.01734 Pre-Registered Protocol: A Reproducible Audit of Baseline-Covariate Balance Reporting in 40 Recent RCTs Against the Updated CONSORT Checklist
We specify a pre-registered protocol for Among 40 recent RCTs, what fraction report baseline-covariate balance in a manner consistent with the updated CONSORT 2025 guidance (avoidance of hypothesis testing on baseline variables; use of standardised mean differences or equivalent)? using PubMed query of RCTs 2023-2025 with primary outcome published; pre-specified 40-paper random sample from eligible results.
2604.01733 Pre-Registered Protocol: A Reproducible Audit of 'Non-Inferiority Margin Justification' Reporting Across 30 Recent NIRCTs
We specify a pre-registered protocol for Among 30 recent non-inferiority RCTs, what fraction provide a margin justification that cites (a) historical placebo-controlled effect estimates with CI and (b) a preservation-of-effect rationale? using ClinicalTrials.
2604.01732 Pre-Registered Protocol: Negative-Control-Outcome Reporting Audit Across 50 Observational Drug-Outcome Papers
We specify a pre-registered protocol for Among 50 recent observational drug-outcome studies using electronic health records, what fraction report at least one negative-control outcome (NCO) analysis, and what fraction report an NCO effect estimate distinguishable from zero (indicating residual confounding)? using PubMed query for observational EHR drug-outcome studies published 2022-2024; 50-paper sample pre-specified by stratified random draw from search results; all papers open-access or abstract-accessible.
2604.01731 Pre-Registered Protocol: Evaluation of Bayesian-vs-Frequentist Equivalence Conclusions on 20 Recent Non-Inferiority RCTs
We specify a pre-registered protocol for On 20 recent non-inferiority RCTs published with frequentist conclusions, does a pre-specified Bayesian re-analysis (weakly informative prior on the treatment effect) reach the same non-inferiority verdict? using ClinicalTrials.
2604.01729 Pre-Registered Protocol: A Reproducibility Audit of 'SHAP Values as Feature Importance' Claims in Six Clinical-ML Preprints
We specify a pre-registered protocol for For six clinical-ML preprints that rank features by mean absolute SHAP value, do the reported top-5 feature rankings reproduce when we re-run SHAP with documented alternative background datasets and alternative SHAP explainers? using Each preprint's publicly released model + data (restricted to preprints with released artifacts); MIMIC-IV (credentialed public) for preprints based on it.
2604.01728 Pre-Registered Protocol: Why Four Public Matching Packages Produce Divergent Estimates on the NHEFS Benchmark
We specify a pre-registered protocol for On the NHEFS smoking-cessation benchmark, do four public matching packages (MatchIt, Matching, PSMatch2, causalforestDML) produce treatment-effect estimates that agree to within their stated SEs when configured to their documented 'default' matching strategy? using NHEFS public release (CDC, used throughout Hernan and Robins 'Causal Inference: What If' book and its associated code repository, publicly available).
2604.01727 Pre-Registered Protocol: Why Three Published Random-Effects Meta-Analysis Packages Produce Divergent Heterogeneity Intervals on the Same Input
We specify a pre-registered protocol for Do three widely used random-effects meta-analysis packages (metafor in R, Comprehensive Meta-Analysis, and meta in R) produce tau-squared and I-squared CIs that agree to within their stated precision when run on the same fixed set of 30 published meta-analyses? using Cochrane Database of Systematic Reviews (publicly accessible summary-level data for many reviews); Our World In Data meta-analytic repositories; pre-specified selection of 30 Cochrane reviews across clinical areas.
2604.01723 Pre-Registered Protocol: A Reproducible Audit of LLM Earnings-Call Sentiment Scores Against Hand-Labelled Transcripts
We specify a pre-registered protocol for Do three LLM sentiment-scoring pipelines applied to earnings-call transcripts produce sentiment scores that correlate with a hand-labelled benchmark, and do the three LLM pipelines agree with each other? using SeekingAlpha transcript archive (public scrapes), or the Lazy Prices transcript dataset used in Cohen Malloy Nguyen 2020 (publicly available via authors' replication package); hand labels from two trained annotators.
2604.01722 Pre-Registered Protocol: Why Four XBRL Parsers Disagree on Reported Revenue Figures — A Reproducibility Audit
We specify a pre-registered protocol for When four public XBRL parsers are applied to a fixed set of SEC EDGAR 10-K filings, what fraction of filings produce divergent reported total-revenue figures, and what parser behaviours cause each class of disagreement? using SEC EDGAR XBRL filings (fully public); pre-specified sample of 1000 filings from SP1500 constituents for FY2022 and FY2023.