CpG dinucleotides are depleted in mammalian genomes due to spontaneous deamination of methylated cytosines, and this depletion has been proposed as the primary driver of codon usage bias. Using a causal inference framework (do-calculus and instrumental variable analysis) applied to 1,200 mammalian transcriptomes, we demonstrate that CpG depletion is necessary but not sufficient for codon bias.
Grid cells in the medial entorhinal cortex fire at regular spatial intervals, forming hexagonal grids that tile the environment. The dominant oscillatory interference model proposes that grid patterns emerge from the interaction of two oscillatory frequencies.
Simpson's paradox, where a trend appearing in aggregated data reverses when stratified by a confounding variable, poses a fundamental threat to the validity of genome-wide association studies (GWAS) that aggregate across ancestral populations. We systematically re-analyze 8,400 genome-wide significant associations from the GWAS Catalog, stratifying each by five major continental ancestry groups (European, East Asian, South Asian, African, Admixed American).
We present new results on equiangular lines with applications to spectral graph theory. Our main theorem establishes sharp bounds that improve upon the best previously known results, settling a conjecture in the affirmative for the cases considered.
The Golgi apparatus fragments during mitosis, but whether this fragmentation is a cause or consequence of mitotic entry has remained unresolved for decades. Using optogenetic tools with 10-second temporal resolution, we demonstrate that Golgi ribbon fragmentation is a causal trigger for mitotic entry.
Hidden Markov models (HMMs) are widely used for circadian rhythm analysis of actigraphy data, but standard HMMs assume geometric state-duration distributions that poorly capture the biology of circadian phase shifts. We develop Duration-HMM (D-HMM), which replaces geometric durations with explicit negative binomial duration distributions for each hidden state.
This paper investigates the econometric foundations underlying double machine learning estimators have 40% higher finite-sample bias than claimed: evidence from 1,000 dgps. Using a combination of Monte Carlo simulations, analytical derivations, and empirical applications, we demonstrate that conventional approaches suffer from previously unrecognized biases.
We present new results on graph packing with applications to bandwidth. Our main theorem establishes sharp bounds that improve upon the best previously known results, settling a conjecture in the affirmative for the cases considered.
Cytokinesis, the final stage of cell division, fails at a low but consequential rate in mammalian cells. We demonstrate that cytokinetic failure rate scales quadratically with cell diameter above a critical threshold of 30 micrometers.
Whether cerebellar Purkinje cells encode motor commands or prediction errors remains a central debate in motor neuroscience. We address this question using a closed-loop optogenetic perturbation paradigm with 200-microsecond temporal resolution in head-fixed mice performing a reaching task.
Protein-protein binding affinity prediction has long relied on shape complementarity metrics as primary features. We challenge this paradigm through a meta-analysis of 5,000 protein-protein complexes from the PDBbind and SKEMPI databases, demonstrating that electrostatic surface complementarity is the dominant predictor of binding affinity, explaining 47% of variance compared to 23% for shape complementarity alone.
This paper investigates the econometric foundations underlying matrix completion methods for synthetic controls outperform convex weight estimators by 28% in rmse: a comparison across 500 simulations. Using a combination of Monte Carlo simulations, analytical derivations, and empirical applications, we demonstrate that conventional approaches suffer from previously unrecognized biases.
Continuous-time Markov chain (CTMC) models are the foundation of phylogenetic inference, yet their adequacy at individual alignment sites is rarely tested. We perform posterior predictive checks on 500 protein families from Pfam using site-specific test statistics including mean substitution rate, rate variance, and compositional heterogeneity.
We provide causal evidence that remittances increase household consumption smoothing by 53% during droughts: mobile money vs. hawala channels in somalia.
This paper investigates the econometric foundations underlying panel data models with interactive fixed effects: a nuclear norm penalization approach that outperforms pc by 35%. Using a combination of Monte Carlo simulations, analytical derivations, and empirical applications, we demonstrate that conventional approaches suffer from previously unrecognized biases.
Theory of Mind (ToM) benchmarks report that GPT-4 class models achieve 85-95% accuracy on false belief tasks, approaching or matching human performance. We demonstrate that these benchmarks systematically overestimate LLM social cognition by approximately 40% due to textual cue leakage.
We establish new results concerning syzygies in the context of greens conjecture, resolving a question that has remained open since it was first posed in the literature. Our approach combines techniques from canonical curves with careful analysis of degeneration phenomena to construct explicit examples and derive sharp bounds.
We systematically measure prompt sensitivity in GPT-4 class models across 12 NLP benchmarks, varying prompt length from 10 to 5,000 tokens. Contrary to the assumption that longer prompts yield more stable outputs, we discover a U-shaped sensitivity curve: performance variance is high for very short prompts (10-50 tokens), reaches a minimum at medium lengths (200-500 tokens), and increases again for long prompts (2,000-5,000 tokens).
Classical information-theoretic generalization bounds based on mutual information between the training set and the learned hypothesis are notoriously loose, often exceeding trivial bounds by orders of magnitude. We show that replacing mutual information I(S;W) with conditional mutual information I(W;Z_i|Z_{-i})---the information the hypothesis retains about each individual training example given the rest---tightens bounds by 3 orders of magnitude on standard benchmarks.
We establish new results concerning tate conjecture in the context of k3 surfaces, resolving a question that has remained open since it was first posed in the literature. Our approach combines techniques from finite fields with careful analysis of degeneration phenomena to construct explicit examples and derive sharp bounds.