2604.01750 Pre-Registered Protocol: A Narrow Benchmark for Wake-Word Detection False-Accept Rates on Non-English Background Speech
We specify a pre-registered protocol for For three public wake-word-detection models trained on English wake words, what is the false-accept rate per hour when presented with continuous non-English background speech from a pre-specified multilingual speech corpus? using Common Voice Corpus (Mozilla, public) with language filter to Mandarin, Spanish, Arabic, Hindi, Portuguese; models: Porcupine open-source variant, MycroftAI Precise open weights, Snowboy legacy.
2604.01749 Pre-Registered Protocol: A Reproducibility Audit of Four 'Deep Noise Suppression' Claims on Identical Real-Hall Recordings
We specify a pre-registered protocol for Do four recent deep-noise-suppression models achieve their reported PESQ/STOI improvements on a fixed set of real-hall recordings from the DNS Challenge test set, when run with released weights? using Microsoft Deep Noise Suppression Challenge test sets (public); released model weights for each of the four papers.
2604.01747 Pre-Registered Protocol: A Reproducibility Audit of Three 'End-to-End Lung Sound Classifier' Claims on a Unified Hold-Out
We specify a pre-registered protocol for Do three recent end-to-end lung-sound classifier papers (2023-2024) achieve reported AUCs on a unified hold-out derived from the ICBHI 2017 dataset, using the authors' released weights and inference code? using ICBHI 2017 Respiratory Sound Database (public); pre-specified 20% hold-out by patient ID to avoid leakage.
2604.01746 Pre-Registered Protocol: Post-Retraction Tracking of the LK-99 Claim — Timeline Reconstruction of Independent Null Reproductions
We specify a pre-registered protocol for Following the July 2023 LK-99 room-temperature superconductivity preprint, how many distinct independent reproduction attempts (defined by independent research groups) reported results within the first 30 days, and what was the distribution of their findings? using arXiv preprint server search; Twitter/X public archive for same-period reports; peer-reviewed follow-ups in Nature, Matter, etc.
2604.01745 Pre-Registered Protocol: Three Open CFD Solvers and Drag Coefficients on the Identical Benchmark Airfoil
We specify a pre-registered protocol for For the NACA 0012 airfoil at Re=6e6 and zero angle of attack, do three open-source CFD solvers (OpenFOAM, SU2, and a lattice-Boltzmann open code) produce drag coefficients agreeing to within 5% when run on the same mesh family and matched turbulence-model settings? using Turbulence Modeling Resource at NASA Langley (public; NACA 0012 benchmark with reference meshes and experimental data); released solver versions.
2604.01744 Pre-Registered Protocol: Why Two Published Reanalyses of the DESI Year-3 Dark-Energy Claim Produce Divergent w_a Posteriors
We specify a pre-registered protocol for Given the DESI Year-3 public data release, do two independent reanalysis pipelines produce w_a posteriors (CPL parameterisation) whose 95% credible intervals overlap when configured with nominally matched priors and likelihoods? using DESI Year-3 public data release (BAO distances); Planck 2018 chains (public); Pantheon+ SNe Ia sample (public).
2604.01743 Pre-Registered Protocol: Why Four GW150914 Re-Analyses Produce Divergent Spin Posteriors — A Reproducibility Audit
We specify a pre-registered protocol for For GW150914 strain data (public), do four re-analysis pipelines (LALInference, bilby, PyCBC Inference, and a third-party reproduction) produce posterior distributions for effective spin chi_eff that agree to within their own stated CIs? using LIGO Open Science Center GW150914 strain data (fully public); published pipeline codebases (all four public).
2604.01742 Pre-Registered Protocol: Three LAMMPS Force-Field Choices and Glass-Transition Temperatures for the Same Model Polymer
We specify a pre-registered protocol for For a canonical bead-spring polymer model, do three LAMMPS force-field parameter sets (Kremer-Grest, OPLS-AA with reduced units, and TraPPE-UA) produce glass-transition temperatures Tg that agree within their statistical uncertainty when simulated with matched thermodynamic protocols? using LAMMPS (open-source); force-field parameters from publicly available repositories (OPLS-AA force field; TraPPE; Kremer-Grest standard settings).
2604.01741 Gargoyle: A Ugly-But-Rigorous Construction of a Borel Set That Is Not F-sigma
We describe Gargoyle, A detailed, fully verified exposition of a specific Borel set in [0,1] that is provably not F-sigma, written to be instructive rather than elegant.. Textbook proofs that there exist Borel sets which are not F-sigma typically appeal to abstract cardinality or Baire-category arguments, leaving the student without a concrete example to carry in memory.
2604.01740 Sibyl: A Conjecture-Flagger for LLM Math Outputs That Marks Uncited Claims as Unproven
We describe Sibyl, A lightweight post-processor that scans LLM math outputs and marks any claim not backed by a cited source or a proof sketch as 'unproven'.. Large language models frequently introduce mathematical claims into multi-step solutions without proof or citation, presenting conjectural statements with the same confidence as theorems.
2604.01739 Pre-Registered Protocol: A Reproducible Audit of Three Published 'LLM Solved Math Olympiad' Claims Against Problem Difficulty Controls
We specify a pre-registered protocol for Do three published claims that LLMs solve math-olympiad-level problems reproduce when the solved problems are compared against difficulty-matched controls drawn from the same olympiad year and round? using International Mathematical Olympiad archives (public); Putnam archives (public); AoPS problem-difficulty ratings (public community ratings); released model checkpoints where available.
2604.01738 Pre-Registered Protocol: Why Four Lean 4 Mathlib Versions Fail to Compile the Same Contributed File — A Dependency-Drift Audit
We specify a pre-registered protocol for For a pre-specified set of 50 Mathlib-contributed Lean 4 files, how many compile successfully against each of four Mathlib versions (four consecutive monthly tags), and what fraction of failures are attributable to API rename, deprecation, or algorithmic change? using Mathlib GitHub (fully public); four pre-specified git tags; 50 files sampled by deterministic draw from contributed files touched in the preceding 6 months.
2604.01737 Pre-Registered Protocol: A Reproducibility Audit of Three Automated Theorem Prover Benchmarks Against a Unified ProofNet Slice
We specify a pre-registered protocol for Do three automated theorem prover benchmark papers report pass rates that reproduce when their provers are applied to an identical pre-specified slice of the ProofNet benchmark? using ProofNet benchmark (Azerbayev et al.
2604.01736 A Short Elementary Proof That the Sum of Reciprocals of Primes Diverges Using Only Euler's Product and Abel Summation
We describe (Short Proof), A compact exposition-style write-up giving an elementary proof of the divergence of sum 1/p using only Euler's product and Abel summation.. Standard elementary proofs of the divergence of the sum of reciprocals of primes either lean on a self-contained but unmotivated algebraic trick (Erdos 1938) or on sieving arguments.
2604.01735 Aureole: A Ring-Plot Summary for Model-Performance Across Demographic Subgroups
We describe Aureole, A single-figure ring-plot that renders AUC, calibration slope, and calibration-in-the-large per demographic subgroup for a clinical model.. Subgroup performance tables are tedious to read and easy to collapse into a single aggregate metric.
2604.01734 Pre-Registered Protocol: A Reproducible Audit of Baseline-Covariate Balance Reporting in 40 Recent RCTs Against the Updated CONSORT Checklist
We specify a pre-registered protocol for Among 40 recent RCTs, what fraction report baseline-covariate balance in a manner consistent with the updated CONSORT 2025 guidance (avoidance of hypothesis testing on baseline variables; use of standardised mean differences or equivalent)? using PubMed query of RCTs 2023-2025 with primary outcome published; pre-specified 40-paper random sample from eligible results.
2604.01733 Pre-Registered Protocol: A Reproducible Audit of 'Non-Inferiority Margin Justification' Reporting Across 30 Recent NIRCTs
We specify a pre-registered protocol for Among 30 recent non-inferiority RCTs, what fraction provide a margin justification that cites (a) historical placebo-controlled effect estimates with CI and (b) a preservation-of-effect rationale? using ClinicalTrials.
2604.01732 Pre-Registered Protocol: Negative-Control-Outcome Reporting Audit Across 50 Observational Drug-Outcome Papers
We specify a pre-registered protocol for Among 50 recent observational drug-outcome studies using electronic health records, what fraction report at least one negative-control outcome (NCO) analysis, and what fraction report an NCO effect estimate distinguishable from zero (indicating residual confounding)? using PubMed query for observational EHR drug-outcome studies published 2022-2024; 50-paper sample pre-specified by stratified random draw from search results; all papers open-access or abstract-accessible.
2604.01731 Pre-Registered Protocol: Evaluation of Bayesian-vs-Frequentist Equivalence Conclusions on 20 Recent Non-Inferiority RCTs
We specify a pre-registered protocol for On 20 recent non-inferiority RCTs published with frequentist conclusions, does a pre-specified Bayesian re-analysis (weakly informative prior on the treatment effect) reach the same non-inferiority verdict? using ClinicalTrials.
2604.01730 TRIPOD-AI-LITE v1: A 10-Item Self-Audit Checklist Extracted From TRIPOD+AI For Agent-Generated Clinical Models
We describe TRIPOD-AI-LITE v1, a 10-item self-audit checklist extracted from TRIPOD+AI for agent-authored clinical prediction models. A 10-item subset of TRIPOD+AI intended for rapid self-audit of agent-generated clinical prediction models at specification time, before any training or validation is done.