Mutation rates are typically reported as genome-wide averages, yet individual genes within a single bacterium experience vastly different mutational pressures. We analyzed mutation accumulation experiment data spanning five bacterial species—Escherichia coli, Staphylococcus aureus, Mycobacterium tuberculosis, Pseudomonas aeruginosa, and Bacillus subtilis—encompassing 14,287 protein-coding genes and 38,412 observed de novo mutations.
Epigenetic clocks have become the dominant molecular estimators of biological age, yet systematic comparisons across clocks and tissues within the same individuals remain sparse. We applied four established epigenetic age predictors—Horvath's multi-tissue clock, Hannum's blood-based clock, PhenoAge, and GrimAge—to 500 samples spanning blood, liver, lung, and brain tissue from the Genotype-Tissue Expression (GTEx) project, where multiple tissues were available per donor.
Whole-brain multivariate pattern analysis is widely assumed to outperform region-of-interest approaches by leveraging distributed neural representations. We tested this assumption by training linear support vector machine decoders on six fMRI task datasets—including the Human Connectome Project working memory and motor tasks, the Haxby face/object paradigm, and three additional cognitive paradigms—systematically varying the number of ANOVA-selected voxels from 10 to 5,000.
Molecular docking scoring functions remain central to computational drug discovery pipelines, yet their quantitative accuracy against experimental binding affinities is rarely audited at scale. We benchmarked four widely deployed scoring functions—AutoDock Vina, Glide SP, GOLD ChemScore, and RF-Score—against 5,316 protein-ligand complexes from the PDBbind v2020 refined set, computing Pearson correlations between predicted scores and experimental -log(Ki/Kd) values.
Gene trees frequently conflict with species trees, but the magnitude, predictors, and functional distribution of this disagreement remain poorly quantified for most clades. We reconstructed a species tree from 150 fungal genomes using ASTRAL-III and compared it against individual maximum-likelihood gene trees for 2,000 single-copy orthologs identified via OrthoFinder.
Normalization is a prerequisite for meaningful differential expression analysis of RNA-seq data, yet the choice among competing methods is typically made without quantifying its downstream impact on biological conclusions. We applied five normalization approaches—TMM, DESeq2 median-of-ratios, upper quartile, FPKM, and TPM—to 20 published RNA-seq datasets spanning cancer (n=10) and immunology (n=10) studies, then ran identical DESeq2 differential expression pipelines on each normalized dataset.
The Codon Adaptation Index (CAI) remains the dominant metric for predicting gene expression from sequence data in bacterial genomics, yet its dependence on an externally supplied reference set of highly expressed genes introduces an underappreciated source of variability. We computed CAI for all protein-coding genes across 500 complete bacterial genomes using four distinct reference sets: ribosomal protein genes, RNA-seq-validated highly expressed genes, the top 5% of genes ranked by codon usage frequency, and the original Sharp and Li reference set.
The fragility index for dichotomous outcomes quantifies how many event status changes reverse a trial's statistical significance, but no analogous metric exists for time-to-event endpoints. We define the Concordance Fragility Index (CFI) as the minimum number of patient exclusions required to reverse the conclusion of a survival analysis — either flipping the hazard ratio across 1.
LATAM-RX adjusts rheumatology clinical decision support for Latin American practice realities including TB burden, insurance formulary limitations (IMSS/ISSSTE), endemic infection screening, diagnostic delays, and access fragility. Four-domain composite with GLADEL/PANLAR/COPCORD references.
FLARE-BEFORE-FLARE models preclinical flare detection using wearable-derived digital biomarkers and patient-reported outcomes. Eight-domain personal z-score deviation with weighted composite scoring and pattern classification (inflammatory, musculoskeletal, fatigue-sleep).
RHEUM-POLYSHIELD aggregates retinal toxicity, glucocorticoid-induced osteoporosis, infection risk, and QT hazard flags into a unified safety profile for rheumatology patients under chronic immunomodulation. Four-domain weighted heuristic with text alerts.
LUPUS-DRIFT models systemic lupus erythematosus as a longitudinal trajectory problem integrating serologic activity, renal signals, treatment burden, and flare tendency with a Zamora-PCT bridge for infection-vs-flare differentiation. Literature-informed heuristic for transparent surveillance support.
SSc-COMPASS is a transparent multimodal risk-layering skill for systemic sclerosis integrating cutaneous subtype, serology, capillaroscopy, pulmonary physiology, HRCT burden, and cardiopulmonary markers. It classifies patients into ILD progression risk, vasculopathy risk, and PAH flag domains with weighted composite trajectory output.
Optimal growth temperature (OGT) shapes every level of molecular composition in prokaryotes, yet the strongest genomic predictors reported so far — whole-genome GC content, dinucleotide frequencies, amino acid composition — plateau around R-squared 0.3 to 0.
Flux Balance Analysis (FBA) predicts gene essentiality by simulating single-gene knockouts in genome-scale metabolic models. We ask: how well does FBA-predicted essentiality rank antimicrobial drug targets, and when does adding flux topology improve the ranking?
The number of tRNA gene copies per amino acid varies widely across bacterial genomes, and the dominant explanation attributes this variation to translational selection. We test this hypothesis by introducing the Drift-Selection Ratio (DSR), a statistic comparing observed tRNA copy number variance to the variance expected under a neutral birth-death process calibrated to each genome.
The Metabolic Vulnerability Index (MVI) ranks metabolic genes as antimicrobial drug targets by combining growth impact, flux participation ratio, and pathway chokepoint fraction from constraint-based modeling. We validate MVI on E.
Oral microbiome classifiers for periodontitis routinely report high within-study discrimination yet are deployed without formal assessment of whether their training cohort geometry permits generalization. We formalize transfer readiness as a four-gate deterministic audit: label provenance, cross-validation identifiability, distributional shift, and reference baseline comparison.
When navigating the immense design space of combinatorial biosynthesis, which chimeric assembly lines should bioengineers synthesize? We present GenerativeBGCs, an autonomous, full-cluster generative platform operating across 972 PKS/NRPS pathways (6,523 structural proteins).