DrugRescue: A Deterministic Pipeline for Open Targets Drug-Target-Disease Repurposing Recommendations
DrugRescue: A Deterministic Pipeline for Open Targets Drug-Target-Disease Repurposing Recommendations
Karen Nguyen, Scott Hughes, Claw
Abstract
Drug repurposing -- finding new indications for existing approved drugs -- dramatically reduces the time and cost of bringing therapies to patients. The Open Targets Platform aggregates drug-target-disease associations from clinical trials, FDA labels, and mechanism-of-action databases, but navigating this rich data requires custom bioinformatics. We present DrugRescue, a deterministic pipeline that pre-freezes Open Targets associations for 108 cancer drugs across 173 gene targets and 780 diseases, then compiles them into three decision primitives: (1) forward disease search ranking drugs by clinical phase, target evidence, and indication breadth for a given disease; (2) reverse target search finding all known modulators of a gene with clinical evidence; and (3) repurpose mode identifying diseases where a drug's targets are implicated but the drug is not yet indicated. Applied to non-small cell lung carcinoma, the pipeline ranks 76 drugs with carfilzomib, paclitaxel, and docetaxel scoring highest by target coverage. For EGFR, it identifies 9 approved drugs led by cetuximab and erlotinib. All outputs are deterministic, certificate-carrying, and verified across 59 automated tests.
Introduction
Drug repurposing represents one of the most efficient paths from laboratory to patient. By leveraging existing safety and pharmacokinetic data for approved drugs, repurposing candidates can bypass years of preclinical development. Yet identifying which drugs might work for which diseases requires systematic analysis of drug-target-disease relationships -- data that exists in public databases but requires bioinformatics expertise to query and interpret.
The Open Targets Platform integrates evidence from clinical trials, FDA approvals, ChEMBL, and genetic associations into a unified drug-target-disease graph. A researcher asking "which approved drugs target EGFR?" or "what drugs might work for lung cancer?" must write GraphQL queries, parse nested JSON responses, and construct scoring models -- work repeated independently across teams.
The Open Targets web interface and API already support individual drug/disease/target queries, but require network access, produce non-deterministic outputs (results change as the database updates), and lack machine-checkable provenance. Existing drug-ranking algorithms (e.g., connectivity-map approaches [6], network-based methods) typically require expression data or protein interaction networks beyond what Open Targets provides. DrugRescue occupies a different niche: it pre-freezes the Open Targets association graph into compact derived assets and compiles them into ranked recommendations with certificate-carrying provenance โ offline, deterministic, and auditable. We use "compile" in the software engineering sense: transforming a structured input (the drug-target-disease graph) into a structured output (ranked recommendations with provenance) via a fixed, reproducible transformation.
Data
We queried the Open Targets GraphQL API (v4) for 108 cancer drugs representing all Phase 3+ oncology therapeutics resolvable through our curated search list. Of these, 107 are FDA-approved. The scope covers major approved oncology drugs across all targeted therapy classes (EGFR, BRAF, ALK, HER2, VEGF, CDK4/6, BCL2, BTK, JAK, BCR-ABL, mTOR, PI3K, PARP, checkpoint inhibitors, HDAC, proteasome, and chemotherapy); the architecture is drug-count-agnostic and scales with expanded derived assets. Each drug was resolved by name search to its ChEMBL identifier, then mechanisms of action, targets, and clinical indications were retrieved. The resulting dataset contains:
- drug_target_disease.csv: 16,920 rows (one per drug-target-disease triple)
- drug_summary.csv: 108 drugs with type, max phase, target count, indication count
- target_drug_map.csv: 173 gene targets with drug counts
- disease_drug_map.csv: 780 diseases with drug and approval counts
Method
Forward: Disease Search
Given a disease name, drugs are scored by:
score = (phase / 4) * (n_disease_targets / max_targets) * log(1 + n_indications) / log(1 + max_indications)Reverse: Target Search
Given a gene symbol, drugs are scored by:
score = (phase / 4) * log(1 + n_indications)Repurpose Mode
Given a drug, candidate diseases are scored by:
score = (n_shared_targets / max_shared) * (phase / 4) * (1 - 0.5 * n_existing / max_existing)Results
NSCLC Disease Search (Top 5)
| Rank | Drug | Type | Targets | Score |
|---|---|---|---|---|
| 1 | CARFILZOMIB | Protein | 38 | 0.529 |
| 2 | PACLITAXEL | Small molecule | 15 | 0.370 |
| 3 | DOCETAXEL | Small molecule | 15 | 0.341 |
| 4 | GEMCITABINE | Small molecule | 14 | 0.315 |
| 5 | PAZOPANIB | Small molecule | 11 | 0.210 |
Scoring Characterization: NSCLC Recall
All 18 known FDA-approved NSCLC drugs in our 108-drug set appear in the 76-drug NSCLC ranking (recall = 100%). The composite scoring rewards target coverage and clinical breadth, producing a characteristic stratification by drug class:
- Chemotherapy (median rank ~4): Paclitaxel #2, Docetaxel #3, Gemcitabine #4, Pemetrexed #20, Etoposide #23, Fluorouracil #44, Carboplatin #74
- Targeted therapies (median rank ~28): Crizotinib #12, Afatinib #21, Erlotinib #55, Gefitinib #58, Osimertinib #68
- Immunotherapy (median rank ~39): Bevacizumab #38, Pembrolizumab #39, Nivolumab #40, Atezolizumab #47, Durvalumab #45
The composite scoring achieves 100% recall of known NSCLC drugs. The scoring rewards target coverage and clinical breadth, which favors broad-spectrum chemotherapies; targeted therapies and immunotherapies are better surfaced through the target-search mode (e.g., EGFR search ranks erlotinib #2 and gefitinib #3).
EGFR Target Search (Top 5)
| Rank | Drug | Indications | Score |
|---|---|---|---|
| 1 | CETUXIMAB | 50 | 3.932 |
| 2 | ERLOTINIB | 44 | 3.807 |
| 3 | GEFITINIB | 37 | 3.638 |
| 4 | AFATINIB | 30 | 3.434 |
| 5 | LAPATINIB | 25 | 3.258 |
Repurposing: Multi-Drug Comparison
To demonstrate generalizability across drug classes, we ran repurpose mode on three mechanistically distinct drugs:
| Drug | Class | Targets | Existing Indications | Novel Candidates |
|---|---|---|---|---|
| Olaparib | PARP inhibitor | PARP1, PARP2, PARP3 | 54 | 13 |
| Imatinib | BCR-ABL/KIT inhibitor | ABL1, BCR, KIT, PDGFRB | 49 | 112 |
| Lenalidomide | E3 ligase modulator | CRBN, CUL4A, DDB1, RBX1 | 72 | 56 |
Top 5 candidates per drug:
- Olaparib: biliary tract cancer, fallopian tube cancer, leiomyosarcoma, peritoneum cancer, HER2+ breast carcinoma (all sharing PARP1/2/3)
- Imatinib: paraganglioma, glioblastoma, liver disease, colon neoplasm, medullary thyroid carcinoma (all sharing ABL1/BCR/KIT/PDGFRB)
- Lenalidomide: AIDS, Alzheimer disease, beta-thalassemia, COVID-19, Castleman disease (all sharing CRBN/CUL4A/DDB1/RBX1)
Imatinib generates 112 candidates because its 4 targets (especially KIT and PDGFRB) are broadly implicated across cancer types. Lenalidomide surfaces non-oncology candidates (beta-thalassemia, autoimmune conditions) reflecting the ubiquitin-proteasome pathway's role in inflammation. Several candidates across all three drugs overlap with active ClinicalTrials.gov entries (e.g., imatinib in glioblastoma: NCT01140568; lenalidomide in Castleman disease: NCT01286597), suggesting the scoring surfaces clinically plausible hypotheses.
Certificate Structure
Each compilation produces a certificate.json containing: input file SHA256 hashes, the resolved query parameters, the scoring formula used, per-drug score decompositions, and output file hashes. Example top-level structure:
{"tool": "drug-rescue", "mode": "disease-search",
"input_hashes": {"drug_target_disease.csv": "a3f2..."},
"query": {"disease": "Non-Small Cell Lung Carcinoma"},
"scoring_formula": "phase/4 * targets/max * log(1+ind)/log(1+max)",
"results": [{"drug": "CARFILZOMIB", "score": 0.529,
"phase": 4, "targets": 38, "indications": 96}]}This enables any reviewer to trace a specific drug's ranking back to the exact data and scoring arithmetic that produced it.
Discussion
DrugRescue demonstrates that a pre-frozen drug-target-disease graph can be compiled into a sub-second offline query engine with auditable provenance. The scoring formulas are heuristic design choices that combine clinical phase, target coverage, and indication breadth โ they are not trained models and do not claim to predict clinical success. The formulas trade sophistication for transparency: every score can be manually verified from the certificate.
Comparison with Open Targets Platform
| Capability | Open Targets Web | DrugRescue |
|---|---|---|
| Query type | Single drug/disease/target | Batch: 108 drugs, 780 diseases, 173 targets |
| Offline use | No (requires API) | Yes (vendored assets) |
| Deterministic | No (database updates) | Yes (SHA256 verified) |
| Provenance certificate | No | Yes (per-query JSON with score decomposition) |
| Composite scoring | No (raw associations) | Yes (phase x coverage x breadth) |
| Repurpose mode | No built-in | Yes (target-overlap hypothesis generation) |
Limitations
The 108-drug scope covers major oncology therapeutics but excludes non-oncology drugs and experimental compounds. The scoring does not account for drug selectivity, toxicity profiles, pharmacokinetics, or resistance mechanisms. Repurposing candidates reflect shared target profiles in the database, not mechanistic predictions โ they should be treated as hypotheses for further investigation, not clinical recommendations.
Verification
59 automated tests cover data loading, fuzzy matching, scoring formulas, compilation outputs, certificate structure, determinism, and golden file SHA256 comparison.
References
- Ochoa et al. "Open Targets Platform." NAR 2024.
- Pushpakom et al. "Drug repurposing: progress, challenges and recommendations." Nature Reviews Drug Discovery 2019.
- Zdrazil et al. "The ChEMBL database in 2023." NAR 2024.
- Broad Institute. "DepMap 24Q4 Public Data Release." 2024.
- Ashburn & Thor. "Drug repositioning." Nature Reviews Drug Discovery 2004.
- Corsello et al. "Discovering the anticancer potential of non-oncology drugs." Nature Cancer 2020.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: drug-rescue description: Compile Open Targets drug-target-disease associations into certificate-carrying repurposing recommendations across three modes. allowed-tools: Bash(uv *, python *, python3 *, ls *, test *, shasum *) requires_python: "3.12.x" package_manager: uv repo_root: . canonical_output_dir: outputs/nsclc --- # DrugRescue Pipeline Compile pre-frozen Open Targets Platform drug-target-disease associations into three decision primitives: (1) forward-mode disease search ranking drugs by clinical phase, target evidence, and indication breadth; (2) reverse-mode target search finding all known modulators of a gene; and (3) repurpose mode identifying diseases where a drug's targets are implicated but the drug is not yet indicated for that disease. This skill is a **public data pipeline**: it does not perform new drug screens or clinical analyses. It compiles existing Open Targets drug-target-disease relationships into hypothesis-generating rankings with full certificate-carrying provenance. ## Runtime Expectations - Platform: CPU-only - Python: 3.12.x - Package manager: `uv` - Execution time: <1 second per query - No internet access required after environment install (derived assets are vendored; `uv sync` may fetch packages on first run) - No external credentials required ## Step 1: Install the Locked Environment ```bash uv sync --frozen ``` Success condition: uv completes without errors. ## Step 2: Run Forward-Mode Disease Search ```bash uv run --frozen --no-sync drug-rescue disease-search \ --input inputs/disease_nsclc.yaml \ --outdir outputs/nsclc ``` Success condition: `outputs/nsclc/disease_drugs_ranked.csv` exists with 76 ranked drugs. Expected top-5 drugs for Non-Small Cell Lung Carcinoma: | Rank | Drug | Type | Targets Hit | Score | |------|------|------|-------------|-------| | 1 | CARFILZOMIB | Protein | 38 | 0.5291 | | 2 | PACLITAXEL | Small molecule | 15 | 0.3699 | | 3 | DOCETAXEL | Small molecule | 15 | 0.3408 | | 4 | GEMCITABINE | Small molecule | 14 | 0.3154 | | 5 | PAZOPANIB | Small molecule | 11 | 0.2100 | ## Step 3: Run Reverse-Mode Target Search ```bash uv run --frozen --no-sync drug-rescue target-search \ --input inputs/target_egfr.yaml \ --outdir outputs/egfr ``` Success condition: `outputs/egfr/target_drugs_ranked.csv` exists with 9 ranked drugs. Expected top-5 drugs for EGFR: | Rank | Drug | Type | Indications | Score | |------|------|------|-------------|-------| | 1 | CETUXIMAB | Antibody | 50 | 3.9318 | | 2 | ERLOTINIB | Small molecule | 44 | 3.8067 | | 3 | GEFITINIB | Small molecule | 37 | 3.6376 | | 4 | AFATINIB | Small molecule | 30 | 3.4340 | | 5 | LAPATINIB | Small molecule | 25 | 3.2581 | ## Step 4: Run Repurpose Mode ```bash uv run --frozen --no-sync drug-rescue repurpose \ --input inputs/repurpose_olaparib.yaml \ --outdir outputs/olaparib ``` Success condition: `outputs/olaparib/repurpose_candidates.csv` exists with 13 disease candidates. ## Step 5: Verify Deterministic Reproduction ```bash uv run --frozen --no-sync drug-rescue verify \ --generated outputs/nsclc \ --golden tests/golden_disease_search ``` Success condition: JSON output contains `"ok": true`. ## Step 6: Full Verification with All Checks ```bash uv run --frozen --no-sync drug-rescue verify-full \ --run-dir outputs/nsclc \ --golden-dir tests/golden_disease_search \ --mode disease_search ``` Success condition: JSON output contains `"ok": true` and all 8 checks pass: - disease_drugs_ranked.csv exists - certificate.json exists - summary.md exists - disease_drugs_ranked.csv non-empty - certificate.json parseable JSON - certificate keys present - repurpose_score sorted descending - disease_drugs_ranked SHA256 match ## Step 7: Confirm Required Artifacts Required files in `outputs/nsclc/`: - `disease_drugs_ranked.csv` -- all drugs ranked by repurpose score - `certificate.json` -- audit trail with input/output hashes, scoring formula, per-drug breakdown - `summary.md` -- human-readable drug recommendations Required files in `outputs/egfr/`: - `target_drugs_ranked.csv` -- drugs ranked by target score - `certificate.json` -- audit trail - `summary.md` -- human-readable target drug list Required files in `outputs/olaparib/`: - `repurpose_candidates.csv` -- diseases ranked by repurpose score - `certificate.json` -- audit trail - `summary.md` -- human-readable repurposing candidates ## Optional: Run Full Demo Pipeline ```bash uv run --frozen --no-sync drug-rescue demo ``` Runs disease search (NSCLC), target search (EGFR), and repurpose (olaparib) in one shot. ## Available Inputs | File | Mode | Description | |------|------|-------------| | inputs/disease_nsclc.yaml | disease_search | NSCLC drug ranking | | inputs/target_egfr.yaml | target_search | EGFR drug lookup | | inputs/repurpose_olaparib.yaml | repurpose | Olaparib repurposing candidates | | inputs/repurpose_bevacizumab.yaml | repurpose | Bevacizumab repurposing candidates | ## Scoring Formulas **Forward disease search**: `score = (phase/4) * (n_disease_targets/max_targets) * log(1+n_indications)/log(1+max_indications)` **Reverse target search**: `score = (phase/4) * log(1+n_indications)` **Repurpose mode**: `score = (n_shared_targets/max_shared) * (phase/4) * (1 - 0.5*n_existing/max_indications)` ## Data Source Open Targets Platform (v4 GraphQL API), accessed March 2026: - 108 cancer drugs queried by name via ChEMBL identifiers - 173 gene targets with mechanism-of-action links - 780 diseases with clinical indication data - Sources: ChEMBL, ClinicalTrials.gov, FDA labels, DailyMed Raw API responses are not vendored. Derived assets (~1MB) in `data/derived/` are vendored. ## Scientific Boundary This skill does **not** produce clinical recommendations. It does **not** account for pharmacokinetics, drug resistance, tumor microenvironment, combination effects, or patient-specific factors. It compiles public drug-target-disease associations into hypothesis-generating repurposing recommendations only. ## Determinism Requirements - No randomness - Stable sort order (score descending + name ascending for ties) - No timestamps in scored outputs (CSVs) - JSON keys sorted, CSVs with fixed newline behavior
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.