A Self-Verifying Transfer-Readiness Auditor for Oral Microbiome Cohorts
A Self-Verifying Transfer-Readiness Auditor for Oral Microbiome Cohorts
A Public-Recovery Saliva-Based Periodontitis Study with Cohort-Shift Diagnostics and Baseline Recommendation
Abstract
Oral-microbiome classifiers often report strong within-study performance yet fail when transported across cohorts. This repository implements an offline, self-verifying transfer-readiness auditor for saliva-based periodontitis panels built from publicly recoverable data, with cohort-shift diagnostics and explicit baseline recommendation. In the frozen canonical case, the auditor retained 722 of 796 public-backbone samples, excluded 74 unresolved rows, returned the verdict sparse_transfer_unreliable, and recommended abundance_only.
Frozen Benchmark Design
This repository is an offline, audit-first transfer-readiness benchmark for saliva-based periodontitis cohorts built from the publicly recoverable EPheClass PD_s backbone plus auditable sample-level metadata reconstruction. It does not claim to recreate the deleted batch-effect-removed workbook layer from the source paper.
- primary:
2cohorts,102samples (control39, periodontitis63); cohortsBP41, BP48 - blind:
2cohorts,189samples (control55, periodontitis134); cohortsBP34, BP49 - auxiliary:
5cohorts,431samples (control338, periodontitis93); cohortsBP35, BP36, BP39, BP40, BP44 - excluded:
1cohorts,74samples (control0, periodontitis0); cohortsBP43
Canonical Findings
- label provenance verdict:
auditable - mixed-cohort CV eligibility verdict:
sparse_transfer_unreliable - cohort-shift verdict:
shifted_candidate - shifted primary cohorts:
BP41, BP48 - benchmark verdict:
mixed - recommended model:
abundance_only - pooled AUPRC:
full_model0.8973vsabundance_only0.9239 - durable feature core pooled AUPRC:
0.9079with core improvement0.0414 - blind cohorts withheld from tuning:
BP34, BP49 - valid inner mixed-cohort splits: minimum
1,0reliable outer folds
Shift Diagnostics
BP41: library-size ratio2.7804, nonzero-feature ratio1.8110BP48: library-size ratio1.7032, nonzero-feature ratio1.8775
Why The Recommendation Is Conservative
The result is scientifically useful precisely because it is a negative-transfer finding on auditable public data. The retained mixed panel contains only two primary mixed cohorts, both outer folds fall below the reliable tuning threshold, and at least one held-out cohort is materially shifted relative to its training panel. Under that frozen policy, the correct outcome is to recommend the abundance-only baseline instead of forcing a sparse transfer claim.
Skill Contract
The paired skill executes the same locked contract as the paper:
uv sync --frozen
uv run --frozen --no-sync oral-microbiome-benchmark build-freeze --config config/canonical_periodontitis.yaml --out data/benchmark/freeze
uv run --frozen --no-sync oral-microbiome-benchmark run --config config/canonical_periodontitis.yaml --out outputs/canonical
uv run --frozen --no-sync oral-microbiome-benchmark verify --config config/canonical_periodontitis.yaml --run-dir outputs/canonical
uv run --frozen --no-sync python scripts/prepare_submission_bundle.py --config config/canonical_periodontitis.yaml --run-dir outputs/canonical
uv run --frozen --no-sync python scripts/build_paper_pdf.py --config config/canonical_periodontitis.yamlReproducibility
The frozen source snapshot passed 20/20 verification checks, and both the smoke and full mini-venv replication paths pass from local assets only.
Limitations
- saliva only in v1
- periodontitis vs control only
- taxonomy is optional and the signature-only baseline remains unavailable when taxonomy is absent
- the retained mixed transfer panel is intentionally small and the paper does not claim broad mechanistic completeness
- this is an audit/methods note, not a claim that the more complex sparse model wins
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: oral-microbiome-transfer-auditor description: Execute the locked, offline oral microbiome transfer-readiness auditor for saliva-based periodontitis, including public-recovery freeze building, cross-cohort evaluation, cohort-shift diagnostics, baseline recommendation, and supporting benchmark artifacts. allowed-tools: Bash(uv *, python *, python3 *, curl *, ls *, test *, shasum *, unzip *) requires_python: "3.12.x" package_manager: uv repo_root: . canonical_output_dir: outputs/canonical --- # Oral Microbiome Transfer Auditor This skill executes the audit-first transfer-readiness workflow exactly as frozen by the repository contract. It does not invent cohorts, corrected inputs, unverifiable benchmark rows, or fake sample labels. ## Runtime Expectations - Platform: CPU-only - Python: `3.12.x` - Package manager: `uv` - Offline after the freeze bundle exists locally - Canonical freeze directory: `data/benchmark/freeze` - Paper PDF build requires `tectonic` ## Scope Rules - Saliva only in v1 - Adult samples only when age is available - `periodontitis` vs `control` only - `EPheClass` `PD_s` is the canonical abundance backbone - Canonical v1 is ASV-first - No corrected or batch-effect-removed table in the scored path - Blind cohorts are excluded from thresholding, feature selection, hyperparameter selection, confounder-margin tuning, and durable feature-core distillation ## Step 1: Build Or Confirm The Public-Recovery Raw Bundle The freeze builder will create these raw assets from the public `PD_s` backbone if they are absent: - `data/benchmark/raw/epheclass_pd_s_abundance.tsv` - `data/benchmark/raw/recovered_metadata.tsv` - `data/benchmark/raw/recovered_taxonomy.tsv` The source provenance and reconstruction rules are documented in `data/refs/source_provenance.md`. ## Step 2: Install The Locked Environment ```bash uv sync --frozen ``` ## Step 3: Build The Frozen Benchmark ```bash uv run --frozen --no-sync oral-microbiome-benchmark build-freeze --config config/canonical_periodontitis.yaml --out data/benchmark/freeze ``` ## Step 4: Run The Canonical Auditor ```bash uv run --frozen --no-sync oral-microbiome-benchmark run --config config/canonical_periodontitis.yaml --out outputs/canonical ``` The primary outputs are now the audit verdict, model recommendation, and cohort-shift diagnostics. Legacy benchmark metrics remain as supporting evidence. ## Step 5: Verify The Canonical Run ```bash uv run --frozen --no-sync oral-microbiome-benchmark verify --config config/canonical_periodontitis.yaml --run-dir outputs/canonical ``` ## Step 6: Optional Triage Triage v1 is evaluative only and requires a labeled external cohort: ```bash uv run --frozen --no-sync oral-microbiome-benchmark triage --config config/canonical_periodontitis.yaml --input inputs/new_cohort.tsv --metadata inputs/new_metadata.tsv --out outputs/triage ``` ## Step 7: Freeze The Submission Bundle ```bash uv run --frozen --no-sync python scripts/prepare_submission_bundle.py --config config/canonical_periodontitis.yaml --run-dir outputs/canonical ``` This snapshots the verified run into `submission/freeze/source_canonical/`, writes paper-facing tables and figures into `submission/results/`, and regenerates `paper/generated/`. ## Step 8: Build The Paper PDF ```bash uv run --frozen --no-sync python scripts/build_paper_pdf.py --config config/canonical_periodontitis.yaml ``` If `tectonic` is missing, install it with your local package manager first and then rerun Step 8. ## Optional Step 9: Clean-Room Replication ```bash uv run --frozen --no-sync python scripts/create_mini_venv.py --force uv run --frozen --no-sync python scripts/run_replication_check.py --profile smoke --venv-dir .venv-mini uv run --frozen --no-sync python scripts/run_replication_check.py --profile full --venv-dir .venv-mini ``` The smoke profile uses fixture data and checks the end-to-end contract quickly. The full profile reproduces the canonical freeze, run, verify, submission bundle, paper build, and snapshot comparison from local assets only. ## How To Interpret Verdicts - `transfer_ready`: the retained panel supports a non-baseline transfer claim. - `baseline_only_recommended`: the panel is usable, but the safer recommendation is the abundance baseline. - `sparse_transfer_unreliable`: the panel does not support trustworthy sparse tuning. - `insufficient_mixed_cohorts`: too few mixed cohorts remain for canonical transfer scoring. - `unrecoverable_labels`: label provenance fails. - `shifted_candidate`: one or more retained primary cohorts are materially shifted. ## Canonical Success Criteria The canonical scored path is successful only if: - the freeze builder completes without dropping below the blind-panel requirement - the canonical run completes successfully - the verifier exits `0` - all required outputs are present and nonempty - the verifier reports `passed` - the audit bundle contains a top-level verdict and recommended model - if taxonomy is absent, the run still passes honestly with `signature_only` marked `unavailable_missing_taxonomy` - the submission bundle and paper can be rebuilt from the frozen canonical snapshot without manual edits
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.