Advanced search
1 file | 929.62 KB Add to list

Feature subset selection for splice site prediction

Sven Degroeve (UGent) , Bernard De Baets (UGent) , Yves Van de Peer (UGent) and Pierre Rouzé (UGent)
(2002) BIOINFORMATICS. 18(suppl. 2). p.S75-S83
Author
Organization
Abstract
Motivation: The large amount of available annotated Arabidopsis thaliana sequences allows the induction of splice site prediction models with supervised learning algorithms (see Haussler (1998) for a review and references). These algorithms need information sources or features from which the models can be computed. For splice site prediction, the features we consider in this study are the presence or absence of certain nucleotides in close proximity to the splice site. Since it is not known how many and which nucleotides are relevant for splice site prediction, the set of features is chosen large enough such that the probability that all relevant information sources are in the set is very high. Using only those features that are relevant for constructing a splice site prediction system might improve the system and might also provide us with useful biological knowledge. Using fewer features will of course also improve the prediction speed of the system. Results: A wrapper-based feature subset selection algorithm using a support vector machine or a naive Bayes prediction method was evaluated against the traditional method for selecting features relevant for splice site prediction. Our results show that this wrapper approach selects features that improve the performance against the use of all features and against the use of the features selected by the traditional method.
Keywords
SEQUENCES, GENOMIC DNA

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 929.62 KB

Citation

Please use this url to cite or link to this publication:

MLA
Degroeve, Sven, et al. “Feature Subset Selection for Splice Site Prediction.” BIOINFORMATICS, vol. 18, no. suppl. 2, 2002, pp. S75–83, doi:10.1093/bioinformatics/18.suppl_2.S75.
APA
Degroeve, S., De Baets, B., Van de Peer, Y., & Rouzé, P. (2002). Feature subset selection for splice site prediction. BIOINFORMATICS, 18(suppl. 2), S75–S83. https://doi.org/10.1093/bioinformatics/18.suppl_2.S75
Chicago author-date
Degroeve, Sven, Bernard De Baets, Yves Van de Peer, and Pierre Rouzé. 2002. “Feature Subset Selection for Splice Site Prediction.” BIOINFORMATICS 18 (suppl. 2): S75–83. https://doi.org/10.1093/bioinformatics/18.suppl_2.S75.
Chicago author-date (all authors)
Degroeve, Sven, Bernard De Baets, Yves Van de Peer, and Pierre Rouzé. 2002. “Feature Subset Selection for Splice Site Prediction.” BIOINFORMATICS 18 (suppl. 2): S75–S83. doi:10.1093/bioinformatics/18.suppl_2.S75.
Vancouver
1.
Degroeve S, De Baets B, Van de Peer Y, Rouzé P. Feature subset selection for splice site prediction. BIOINFORMATICS. 2002;18(suppl. 2):S75–83.
IEEE
[1]
S. Degroeve, B. De Baets, Y. Van de Peer, and P. Rouzé, “Feature subset selection for splice site prediction,” BIOINFORMATICS, vol. 18, no. suppl. 2, pp. S75–S83, 2002.
@article{157984,
  abstract     = {{Motivation: The large amount of available annotated Arabidopsis thaliana sequences allows the induction of splice site prediction models with supervised learning algorithms (see Haussler (1998) for a review and references). These algorithms need information sources or features from which the models can be computed. For splice site prediction, the features we consider in this study are the presence or absence of certain nucleotides in close proximity to the splice site. Since it is not known how many and which nucleotides are relevant for splice site prediction, the set of features is chosen large enough such that the probability that all relevant information sources are in the set is very high. Using only those features that are relevant for constructing a splice site prediction system might improve the system and might also provide us with useful biological knowledge. Using fewer features will of course also improve the prediction speed of the system.
Results: A wrapper-based feature subset selection algorithm using a support vector machine or a naive Bayes prediction method was evaluated against the traditional method for selecting features relevant for splice site prediction. Our results show that this wrapper approach selects features that improve the performance against the use of all features and against the use of the features selected by the traditional method.}},
  author       = {{Degroeve, Sven and De Baets, Bernard and Van de Peer, Yves and Rouzé, Pierre}},
  issn         = {{1367-4803}},
  journal      = {{BIOINFORMATICS}},
  keywords     = {{SEQUENCES,GENOMIC DNA}},
  language     = {{eng}},
  location     = {{Saarbrücken, Germany}},
  number       = {{suppl. 2}},
  pages        = {{S75--S83}},
  title        = {{Feature subset selection for splice site prediction}},
  url          = {{http://doi.org/10.1093/bioinformatics/18.suppl_2.S75}},
  volume       = {{18}},
  year         = {{2002}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: