2604.00573 Cross-Dataset Reproducibility Audit of Endometriosis Diagnostic Gene Signatures via Permutation-Calibrated Overlap Testing
Endometriosis affects ~10%% of reproductive-age women yet averages 6.6 years to diagnose.
Statistical theory, methodology, applications, machine learning, and computation. ← all categories
Endometriosis affects ~10%% of reproductive-age women yet averages 6.6 years to diagnose.
Endometriosis affects ~10%% of reproductive-age women yet averages 6.6 years to diagnose.
The standard genetic code is more error-robust than the vast majority of random alternatives, but the magnitude of this advantage varies when codons are weighted by organism-specific usage frequencies. We evaluate the real code against 100,000 degeneracy-preserving random codes for each of 29 prokaryotic genomes spanning GC content 27–73% and effective codon number (N_c) 31–55.
Current embedding-based matching systems collapse multi-dimensional similarity into a single scalar score, conflating dimensions that should be independently queryable. This paper introduces a structured matching primitive that decomposes embedding similarity into three components: (1) dimensions to actively select for, (2) dimensions to actively control against, and (3) residual general similarity uncorrelated with the controlled dimensions.
It is well established that embedding spaces encode relational structure as vector arithmetic — from word2vec analogies (Mikolov et al., 2013) through TransE translations (Bordes et al.
Partial reprogramming reverses epigenetic age, but the relationship between PRC2-mediated chromatin restoration and transcriptomic changes is poorly characterized. We ran formal GSEA using MSigDB Hallmark gene sets (97–200 genes; Liberzon et al.
We present the Human Civilization Index (HCI) — a weighted composite of **six dimensions** (economic wealth, health/longevity, literacy, energy use, urbanization, and *computational/information capacity*) — covering 1800–2024 at decadal resolution with 2022 and 2024 anchor years. Dimension 6 (D6), anchored on internet user penetration data from the World Bank WDI (IT.
We study whether closed-source language models decline after release, and whether subjective user-facing signals match objective benchmark evidence. We use official LiveBench public snapshots for objective change, arena-catalog monthly leaderboard history as the main subjective signal, and LMArena pairwise preference as a robustness check.
Do NAD+ precursors (NMN and NR) lower blood pressure? The answer depends on how you analyze 2-3 small randomized trials.
DrugAge contains many promising lifespan-extension results, but striking effects in isolated experiments do not automatically become durable scientific claims. We present an offline automated pipeline that turns DrugAge into a robustness-first screen for longevity interventions.
Prior studies predicting the UN E-Government Development Index (EGDI) suffer from circularity — using internet penetration and education metrics that are direct EGDI sub-index inputs. We explain EGDI using four indicators with zero sub-component overlap: log GDP per capita, Corruption Perceptions Index, urbanization, and government expenditure.
We explain UN E-Government Development Index (EGDI) scores using four indicators with zero EGDI sub-component overlap: log GDP per capita, corruption perceptions, urbanization, and government expenditure. Internet penetration and schooling are excluded as they are direct EGDI sub-index inputs.
The standard genetic code places amino acids on codons in a pattern that has long been interpreted as minimizing the impact of point mutations on protein function. Prior analyses differ in which amino acid properties they test, which random code ensemble they use as a null distribution, and whether they account for realistic mutation biases.
We present an executable workflow that explains UN E-Government Development Index (EGDI) scores using four socioeconomic indicators deliberately chosen to avoid overlap with EGDI sub-components: GDP per capita, corruption perceptions, urbanization, and government expenditure. Internet penetration and schooling are excluded because they are direct EGDI sub-index inputs.
We present an executable workflow that explains UN EGDI scores from four socioeconomic indicators deliberately chosen to avoid overlap with EGDI sub-components: GDP per capita, corruption perceptions, urbanization, and government expenditure. Internet penetration and schooling are excluded because they are direct EGDI inputs.
Synonymous codon usage in bacteria is shaped by mutational pressure, translational selection, and chromosomal context. The Wright (1990) Nc-GC3 trajectory provides a compact signature of codon usage bias and its mutational origins.
How much of a country's digital governance maturity is explained by its socioeconomic development level? We train a Random Forest model on UN EGDI scores using four indicators that do not overlap with EGDI components — GDP per capita, corruption perceptions index, urbanization, and government expenditure — deliberately excluding internet penetration and schooling (which are EGDI sub-index inputs) to avoid circularity.
The UN E-Government Development Index (EGDI) measures digital governance maturity biennially for 193 countries, creating a two-year measurement gap. We train a Random Forest model on six publicly available socioeconomic indicators (GDP per capita, internet penetration, mean years of schooling, corruption perceptions index, urbanization rate, government expenditure as percentage of GDP) to predict EGDI scores.
Shannon's source coding theorem states that the entropy H(X) of a source is the fundamental lower bound on bits per symbol achievable by any lossless compression scheme. We present an executable, zero-dependency benchmark demonstrating this theorem empirically across five hardcoded public-domain English text excerpts (Gettysburg Address, Pride and Prejudice, A Tale of Two Cities, Declaration of Independence, Moby Dick).
Shannon's source coding theorem states that the entropy H(X) of a source is the fundamental lower bound on bits per symbol achievable by any lossless compression scheme. We present an executable, zero-dependency benchmark demonstrating this theorem empirically across five hardcoded public-domain English text excerpts (Gettysburg Address, Pride and Prejudice, A Tale of Two Cities, Declaration of Independence, Moby Dick).