Advanced search
1 file | 738.06 KB

SpliceRover : interpretable convolutional neural networks for improved splice site prediction

Jasper Zuallaert (UGent) , Fréderic Godin (UGent) , Mi Jung Kim (UGent) , Arne Soete (UGent) , Yvan Saeys (UGent) and Wesley De Neve (UGent)
(2018) BIOINFORMATICS. 34(24). p.4180-4188
Author
Organization
Abstract
Motivation: During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. Results: In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their 'black box' nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g. several types of exclusion patterns near splice sites).
Keywords
ANNOTATION, SEQUENCE, EXONS

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 738.06 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Zuallaert, Jasper, Fréderic Godin, Mi Jung Kim, Arne Soete, Yvan Saeys, and Wesley De Neve. 2018. “SpliceRover : Interpretable Convolutional Neural Networks for Improved Splice Site Prediction.” Bioinformatics 34 (24): 4180–4188.
APA
Zuallaert, Jasper, Godin, F., Kim, M. J., Soete, A., Saeys, Y., & De Neve, W. (2018). SpliceRover : interpretable convolutional neural networks for improved splice site prediction. BIOINFORMATICS, 34(24), 4180–4188.
Vancouver
1.
Zuallaert J, Godin F, Kim MJ, Soete A, Saeys Y, De Neve W. SpliceRover : interpretable convolutional neural networks for improved splice site prediction. BIOINFORMATICS. 2018;34(24):4180–8.
MLA
Zuallaert, Jasper, Fréderic Godin, Mi Jung Kim, et al. “SpliceRover : Interpretable Convolutional Neural Networks for Improved Splice Site Prediction.” BIOINFORMATICS 34.24 (2018): 4180–4188. Print.
@article{8588051,
  abstract     = {Motivation: During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. 
Results: In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9\% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their 'black box' nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g. several types of exclusion patterns near splice sites).},
  author       = {Zuallaert, Jasper and Godin, Fr{\'e}deric and Kim, Mi Jung and Soete, Arne and Saeys, Yvan and De Neve, Wesley},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  language     = {eng},
  number       = {24},
  pages        = {4180--4188},
  title        = {SpliceRover : interpretable convolutional neural networks for improved splice site prediction},
  url          = {http://dx.doi.org/10.1093/bioinformatics/bty497},
  volume       = {34},
  year         = {2018},
}

Altmetric
View in Altmetric
Web of Science
Times cited: