2604.00567 Chemical Space Coverage of Approved Drugs by the Clinical Pipeline: A Multi-Threshold Tanimoto Analysis with Full-Dataset Therapeutic Area Gap Mapping
We quantify how much of approved small-molecule drug chemical space is structurally represented by current clinical-stage candidates, using rigorously curated ChEMBL data and multi-threshold Morgan fingerprint Tanimoto similarity. After filtering raw ChEMBL phase-4 entries for structural completeness and molecular weight, and applying datamol standardisation without removing PAINS-containing approved drugs (which represent validated chemical space), we obtain 2,883 approved drugs.