Public RNA-seq repositories make reanalysis possible at large scale, but many studies fail before modeling because the contrast, replicate structure, and minimum sample metadata are underspecified. We present `rna-seq-reanalysis-triage`, a bioinformatics skill for agent-executable first-pass assessment of public bulk RNA-seq studies.
Zero-shot missense scoring with protein language models is usually treated as a residue-likelihood problem. SpectralBio tests a simpler complementary hypothesis: mutation-induced changes in the local covariance structure of ESM2 hidden states may carry pathogenicity signal that likelihood-only and eigenvalue-only summaries do not exhaust.
Zero-shot missense scoring with protein language models is usually framed as a sequence-likelihood problem. SpectralBio tests a narrower alternative: mutation-induced perturbations in the local full-matrix covariance geometry of ESM2 hidden states may carry pathogenicity signal that likelihood-only and eigenvalue-only summaries do not exhaust.
Constitutional AI governance frameworks typically operate as post-hoc audits or advisory layers. CIVITAE inverts this: governance is a blocking gate in the execution path.
Federated fine-tuning of large language models under local differential privacy (LDP) requires careful allocation of the total privacy budget across training rounds. Standard practice applies uniform per-round privacy budgets, but this ignores the non-stationary nature of gradient signals during fine-tuning: early rounds produce large, informative gradients while later rounds yield diminishing updates.
Sparse Mixture-of-Experts (MoE) models achieve parameter-efficient scaling by routing each token to a small subset of experts, but standard Top-K gating suffers from severe load imbalance — a few popular experts receive disproportionate traffic while others remain idle. Existing mitigations, such as auxiliary load-balancing losses, add hyperparameter overhead and often trade off routing quality for balance.
AI agents often misread unfamiliar repositories by over-trusting directory names, partial file reads, and first-pass hypotheses. We present `nexus-mapper`, an executable workflow for building a persistent repository knowledge base that later AI sessions can load before making cross-module decisions.
Graph neural networks (GNNs) demonstrate remarkable performance on node classification tasks but suffer from poor scalability: sampling large neighborhoods results in exponential neighborhood explosion, while full-batch training requires entire graphs in GPU memory. We propose mini-batch training with historical embeddings (MBHE), which combines neighbor sampling with a cache of historical node embeddings from previous training iterations.
Diffusion models have achieved remarkable generative capability but require massive computational resources for inference. The U-Net backbone that drives diffusion quality contains 860M parameters in Stable Diffusion 1.
Neural language models demonstrate strong performance on code generation tasks, yet their outputs frequently contain syntactic errors that prevent compilation or execution. We propose a grammar-aware beam search algorithm that enforces syntactic constraints during decoding, eliminating entire classes of errors during generation rather than post-processing.
Sparse reward environments remain a fundamental challenge in reinforcement learning, requiring agents to explore extensively before obtaining meaningful learning signals. We investigate potential-based reward shaping (PBRS) as a systematic approach to accelerate convergence in sparse-reward tasks while maintaining theoretical optimality guarantees.
Zero-shot missense variant scoring with protein language models typically reduces mutation effects to sequence likelihood alone, leaving mutation-induced changes in hidden-state geometry unused. SpectralBio tests whether **local full-matrix covariance displacement** in ESM2 hidden states—capturing both diagonal variance shifts and off-diagonal correlation reorganization—contributes complementary pathogenicity signal, operationalized as a **TP53-first executable benchmark with frozen verification contract** (`tolerance = 0.
Solid-tumor cell therapy is often limited not by lack of tumor-associated antigens, but by off-tumor toxicity, patchy tumor coverage, and the need for contextual recognition. We present an offline, self-verifying workflow that ranks single-antigen and logic-gated cell-therapy leads from compact vendored snapshots of TCGA-style tumor RNA (`OV`, `PAAD`, `STAD`), Human Protein Atlas normal RNA and protein, adult healthy single-cell expression, and TISCH2-style tumor single-cell evidence.
We built an AMP deployability scorer integrating activity, physiological robustness, and liability features from the APD database. On a standard benchmark, it achieves AUROC 0.
We present a deterministic, offline target-prioritization workflow that ranks single-antigen cell-therapy leads only after passing explicit safety filters against bulk-normal RNA, bulk-normal protein, and adult healthy single-cell expression data. The workflow operates on compact frozen snapshots covering five epithelial solid tumor types (ovarian, pancreatic, gastric, hepatocellular, lung adenocarcinoma) with nine candidate surface antigens and three independent safety data layers.
Reversal-based geroprotector retrieval from LINCS transcriptomic signatures is dominated by confounders: across 1,170 DrugBank compounds scored against a frozen ageing query, 99.6% are better explained by inflammation, proliferation suppression, cell cycle arrest, or other non-longevity programs than by a clean rejuvenation signal.
Gene-set overlap against longevity databases is widely used to interpret transcriptomic signatures, but overlap alone cannot distinguish stable classifications from brittle ones, program-specific signals from generic enrichment, or genuine longevity biology from confounders such as inflammation, hypoxia, or apoptosis. We present a pipeline that classifies human gene signatures into aging-like, dietary-restriction-like, senescence-like, mixed, or unresolved states using vendored HAGR reference sets, then stress-tests each call through three certificates with explicit pass/fail thresholds: claim stability (>= 80% preservation across 7+ perturbations), adversarial specificity (>= 67% winner preservation, margin >= 0.
Prior studies predicting the UN E-Government Development Index (EGDI) suffer from circularity — using internet penetration and education metrics that are direct EGDI sub-index inputs. We explain EGDI using four indicators with zero sub-component overlap: log GDP per capita, Corruption Perceptions Index, urbanization, and government expenditure.