Advanced search
1 file | 533.06 KB Add to list

Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

(2020) BRIEFINGS IN BIOINFORMATICS. 21(1). p.262-271
Author
Organization
Abstract
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models.
Keywords
network inference, biological networks, cross-validation, kernel methods, DRUG-TARGET INTERACTIONS, PROTEIN INTERACTION PREDICTION, KERNEL METHODS, GENOMIC DATA, IDENTIFICATION, EVOLUTIONARY, INTERACTOME, INTEGRATION, SIMILARITY, MACHINES

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 533.06 KB

Citation

Please use this url to cite or link to this publication:

MLA
Stock, Michiel, et al. “Algebraic Shortcuts for Leave-One-out Cross-Validation in Supervised Network Inference.” BRIEFINGS IN BIOINFORMATICS, vol. 21, no. 1, 2020, pp. 262–71, doi:10.1093/bib/bby095.
APA
Stock, M., Pahikkala, T., Airola, A., Waegeman, W., & De Baets, B. (2020). Algebraic shortcuts for leave-one-out cross-validation in supervised network inference. BRIEFINGS IN BIOINFORMATICS, 21(1), 262–271. https://doi.org/10.1093/bib/bby095
Chicago author-date
Stock, Michiel, Tapio Pahikkala, Antti Airola, Willem Waegeman, and Bernard De Baets. 2020. “Algebraic Shortcuts for Leave-One-out Cross-Validation in Supervised Network Inference.” BRIEFINGS IN BIOINFORMATICS 21 (1): 262–71. https://doi.org/10.1093/bib/bby095.
Chicago author-date (all authors)
Stock, Michiel, Tapio Pahikkala, Antti Airola, Willem Waegeman, and Bernard De Baets. 2020. “Algebraic Shortcuts for Leave-One-out Cross-Validation in Supervised Network Inference.” BRIEFINGS IN BIOINFORMATICS 21 (1): 262–271. doi:10.1093/bib/bby095.
Vancouver
1.
Stock M, Pahikkala T, Airola A, Waegeman W, De Baets B. Algebraic shortcuts for leave-one-out cross-validation in supervised network inference. BRIEFINGS IN BIOINFORMATICS. 2020;21(1):262–71.
IEEE
[1]
M. Stock, T. Pahikkala, A. Airola, W. Waegeman, and B. De Baets, “Algebraic shortcuts for leave-one-out cross-validation in supervised network inference,” BRIEFINGS IN BIOINFORMATICS, vol. 21, no. 1, pp. 262–271, 2020.
@article{8655765,
  abstract     = {{Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models.}},
  author       = {{Stock, Michiel and Pahikkala, Tapio and Airola, Antti and Waegeman, Willem and De Baets, Bernard}},
  issn         = {{1467-5463}},
  journal      = {{BRIEFINGS IN BIOINFORMATICS}},
  keywords     = {{network inference,biological networks,cross-validation,kernel methods,DRUG-TARGET INTERACTIONS,PROTEIN INTERACTION PREDICTION,KERNEL METHODS,GENOMIC DATA,IDENTIFICATION,EVOLUTIONARY,INTERACTOME,INTEGRATION,SIMILARITY,MACHINES}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{262--271}},
  title        = {{Algebraic shortcuts for leave-one-out cross-validation in supervised network inference}},
  url          = {{http://doi.org/10.1093/bib/bby095}},
  volume       = {{21}},
  year         = {{2020}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: