Diversity-aware training data curation has recently been shown to outperform naive data scaling
for histopathology pre-training, yet no systematic study exists for fluorescence microscopy
fine-tuning — a domain with fundamentally different spatial statistics (4-channel single-cell
crops, 28 organelle classes, extreme class imbalance). We benchmark five curation strategies —
random sampling, k-Center Greedy coreset, Furthest Point Sampling (FPS), class-balanced oracle
selection, and a novel domain-specific BIO-Diversity score combining per-channel entropy with
patch-level boundary coverage — across four training data fractions (25%–100%) of the HPA
Single-Cell Classification dataset.
Pre-trained Masked Autoencoders (MAE) have demonstrated strong performance on natural image benchmarks, but their utility for subcellular biology remains poorly characterized. We introduce OrgBoundMAE, a benchmark that evaluates MAE representations on organelle localization classification using the Human Protein Atlas (HPA) single-cell fluorescence image collection — 31,072 four-channel immunofluorescence crops covering 28 organelle classes.
Pre-trained Masked Autoencoders (MAE) have demonstrated strong performance on natural image benchmarks, but their utility for subcellular biology remains poorly characterized. We introduce OrgBoundMAE, a benchmark that evaluates MAE representations on organelle localization classification using the Human Protein Atlas (HPA) single-cell fluorescence image collection — 31,072 four-channel immunofluorescence crops covering 28 organelle classes.
Diversity-aware training data curation has recently been shown to outperform naive data scaling
for histopathology pre-training, yet no systematic study exists for fluorescence microscopy
fine-tuning — a domain with fundamentally different spatial statistics (4-channel single-cell
crops, 28 organelle classes, extreme class imbalance). We benchmark five curation strategies —
random sampling, k-Center Greedy coreset, Furthest Point Sampling (FPS), class-balanced oracle
selection, and a novel domain-specific BIO-Diversity score combining per-channel entropy with
patch-level boundary coverage — across four training data fractions (25%–100%) of the HPA
Single-Cell Classification dataset.
Diversity-aware training data curation has recently been shown to outperform naive data scaling
for histopathology pre-training, yet no systematic study exists for fluorescence microscopy
fine-tuning — a domain with fundamentally different spatial statistics (4-channel single-cell
crops, 28 organelle classes, extreme class imbalance). We benchmark five curation strategies —
random sampling, k-Center Greedy coreset, Furthest Point Sampling (FPS), class-balanced oracle
selection, and a novel domain-specific BIO-Diversity score combining per-channel entropy with
patch-level boundary coverage — across four training data fractions (25%–100%) of the HPA
Single-Cell Classification dataset.
Pre-trained Masked Autoencoders (MAE) have demonstrated strong performance on natural image benchmarks, but their utility for subcellular biology remains poorly characterized. We introduce OrgBoundMAE, a benchmark that evaluates MAE representations on organelle localization classification using the Human Protein Atlas (HPA) single-cell fluorescence image collection — 31,072 four-channel immunofluorescence crops covering 28 organelle classes.
Pre-trained Masked Autoencoders (MAE) have demonstrated strong performance on natural image benchmarks, but their utility for subcellular biology remains poorly characterized. We introduce OrgBoundMAE, a benchmark that evaluates MAE representations on organelle localization classification using the Human Protein Atlas (HPA) single-cell fluorescence image collection — 31,072 four-channel immunofluorescence crops covering 28 organelle classes.
We present ModalDrop-JEPA, a self-supervised pretraining framework for clinical multimodal learning that applies JEPA's representation-space prediction principle at the modality level. Rather than masking image patches (V-JEPA) or optical flow pairs (MC-JEPA), ModalDrop-JEPA randomly drops entire clinical modalities (imaging, labs, notes, vitals) with probability p and trains a cross-modal predictor to reconstruct missing modality representations from available ones.
We present MedOS-JEPA, an integration of the Motion-Content Joint Embedding Predictive Architecture (MC-JEPA) as the visual backbone of MedOS — a dual-process world model for clinical AI. MC-JEPA jointly learns optical flow and semantic content from surgical video via a shared ViT encoder, without pixel reconstruction.
We present MedOS-JEPA, an integration of the Motion-Content Joint Embedding Predictive Architecture (MC-JEPA) as the visual backbone of MedOS — a dual-process world model for clinical AI. MC-JEPA jointly learns optical flow and semantic content from surgical video via a shared ViT encoder, without pixel reconstruction.
We present MedOS-JEPA, an integration of the Motion-Content Joint Embedding Predictive Architecture (MC-JEPA) as the visual backbone of MedOS — a dual-process world model for clinical AI. MC-JEPA jointly learns optical flow and semantic content from surgical video via a shared ViT encoder, without pixel reconstruction.
We present MedOS-JEPA, an integration of the Motion-Content Joint Embedding Predictive Architecture (MC-JEPA) as the visual backbone of MedOS — a dual-process world model for clinical AI. MC-JEPA jointly learns optical flow and semantic content from surgical video via a shared ViT encoder, without pixel reconstruction.