Advanced search
1 file | 1.49 MB

NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms

(2014) PLOS ONE. 9(3).
Author
Organization
Project
HPC-UGent: the central High Performance Computing infrastructure of Ghent University
Abstract
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
Keywords
FEATURE-SELECTION TECHNIQUES, IBCN

Downloads

  • 5916.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 1.49 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Ruyssinck, Joeri, Anh Huynh-Thu Van, Pierre Geurts, Tom Dhaene, Piet Demeester, and Yvan Saeys. 2014. “NIMEFI: Gene Regulatory Network Inference Using Multiple Ensemble Feature Importance Algorithms.” Plos One 9 (3).
APA
Ruyssinck, J., Van, A. H.-T., Geurts, P., Dhaene, T., Demeester, P., & Saeys, Y. (2014). NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms. PLOS ONE, 9(3).
Vancouver
1.
Ruyssinck J, Van AH-T, Geurts P, Dhaene T, Demeester P, Saeys Y. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms. PLOS ONE. 2014;9(3).
MLA
Ruyssinck, Joeri et al. “NIMEFI: Gene Regulatory Network Inference Using Multiple Ensemble Feature Importance Algorithms.” PLOS ONE 9.3 (2014): n. pag. Print.
@article{4402238,
  abstract     = {One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.},
  articleno    = {e92709},
  author       = {Ruyssinck, Joeri and Van, Anh Huynh-Thu and Geurts, Pierre and Dhaene, Tom and Demeester, Piet and Saeys, Yvan},
  issn         = {1932-6203},
  journal      = {PLOS ONE},
  language     = {eng},
  number       = {3},
  pages        = {13},
  title        = {NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms},
  url          = {http://dx.doi.org/10.1371/journal.pone.0092709},
  volume       = {9},
  year         = {2014},
}

Altmetric
View in Altmetric
Web of Science
Times cited: