Computational Prediction of Protein-Protein Interaction Networks Using Graph Neural Networks and Evolutionary Features

Abstract

Protein-protein interactions (PPIs) are fundamental to virtually all biological processes, yet experimental determination of complete interactomes remains resource-intensive and error-prone. We present a novel computational framework combining graph neural networks (GNNs) with evolutionary coupling analysis to predict high-confidence PPIs at proteome scale. Our approach integrates sequence-based co-evolution signals, structural embedding features, and network topology constraints to achieve state-of-the-art performance on benchmark datasets. Cross-validation on the Human Reference Interactome (HuRI) demonstrates an AUC-ROC of 0.94, representing a 12% improvement over existing deep learning methods. We apply our framework to predict 2,347 previously uncharacterized interactions in cancer-related pathways, providing novel targets for therapeutic intervention. The predictions are validated through independent affinity purification-mass spectrometry (AP-MS) experiments with 78% confirmation rate. This work demonstrates the power of integrating evolutionary information with deep representation learning for systematic mapping of cellular interaction networks.

Keywords

Protein-protein interactions, Graph neural networks, Co-evolution, Interactome, Deep learning, Cancer pathways

1. Introduction

Protein-protein interactions (PPIs) form the backbone of cellular machinery, governing processes from signal transduction to metabolic regulation. A complete map of the interactome—defined as the full network of protein interactions within an organism—would provide unprecedented insight into cellular function and dysfunction. Despite two decades of high-throughput experimental efforts, even well-studied organisms like Homo sapiens have substantial gaps in their characterized interaction networks.

Experimental PPI determination methods, including yeast two-hybrid (Y2H) screening and affinity purification-mass spectrometry (AP-MS), suffer from high false positive and negative rates. These limitations have motivated computational approaches to predict and prioritize interactions for experimental validation.

2. Methods

2.1 Data Collection and Preprocessing

Training datasets: We assembled PPIs from multiple sources: Human Reference Interactome (HuRI): 52,519 high-confidence interactions, BioGRID (v4.4): 312,414 interactions, IntAct (v4.3): 201,387 interactions.

Negative sampling: We generated negative examples using the random pairing with subcellular localization constraint strategy.

2.2 Model Architecture

EvoGraphPPI employs a dual-branch neural architecture with Sequence Encoder using ESM-2 transformer and Co-evolution Encoder using DCA scores.

2.3 Training Procedure

Multi-task loss combining binary cross-entropy and contrastive graph objective.

3. Results

3.1 Performance on Benchmark Datasets

EvoGraphPPI achieves AUC-ROC of 0.94 on HuRI, compared to 0.82 for DPPI, 0.86 for GNN-PPI, and 0.88 for DCA-PI.

3.2 Novel Cancer Pathway Predictions

p53 signaling pathway: 312 novel interactions, PI3K-Akt signaling pathway: 487 novel interactions, MAPK signaling pathway: 298 novel interactions.

3.3 Experimental Validation Results

Y2H validation: 67/100 positive interactions, AP-MS validation: 78/100 confirmed interactions.

4. Discussion

Our results demonstrate that integrating evolutionary coupling information with deep representation learning significantly improves PPI prediction accuracy.

5. Conclusion

We presented EvoGraphPPI, achieving state-of-the-art performance by unifying graph neural networks with evolutionary coupling analysis.

6. Data and Code Availability

Source code and models available at GitHub.

References

Rolland et al. A proteome-scale map of the human interactome. Cell 2014. Luck et al. A reference map of the human binary interactome. Nature 2020.