Ghent University Academic Bibliography

Advanced

Feature subset selection for splice site prediction

Sven Degroeve UGent, Bernard De Baets UGent, Yves Van de Peer UGent and Pierre Rouzé (2002) BIOINFORMATICS. 18(suppl. 2). p.S75-S83
abstract
Motivation: The large amount of available annotated Arabidopsis thaliana sequences allows the induction of splice site prediction models with supervised learning algorithms (see Haussler (1998) for a review and references). These algorithms need information sources or features from which the models can be computed. For splice site prediction, the features we consider in this study are the presence or absence of certain nucleotides in close proximity to the splice site. Since it is not known how many and which nucleotides are relevant for splice site prediction, the set of features is chosen large enough such that the probability that all relevant information sources are in the set is very high. Using only those features that are relevant for constructing a splice site prediction system might improve the system and might also provide us with useful biological knowledge. Using fewer features will of course also improve the prediction speed of the system. Results: A wrapper-based feature subset selection algorithm using a support vector machine or a naive Bayes prediction method was evaluated against the traditional method for selecting features relevant for splice site prediction. Our results show that this wrapper approach selects features that improve the performance against the use of all features and against the use of the features selected by the traditional method.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (proceedingsPaper)
publication status
published
subject
keyword
SEQUENCES, GENOMIC DNA
journal title
BIOINFORMATICS
Bioinformatics
volume
18
issue
suppl. 2
pages
S75 - S83
conference name
European Conference on Computational Biology 2002 (ECCB 2002)
conference location
Saarbrücken, Germany
conference start
2002-10-06
conference end
2002-10-09
Web of Science type
Article
Web of Science id
000178836800012
JCR category
COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
JCR impact factor
4.615 (2002)
JCR rank
1/79 (2002)
JCR quartile
1 (2002)
ISSN
1367-4803
DOI
10.1093/bioinformatics/18.suppl_2.S75
language
English
UGent publication?
yes
classification
A1
copyright statement
I have transferred the copyright for this publication to the publisher
id
157984
handle
http://hdl.handle.net/1854/LU-157984
date created
2004-01-14 13:39:00
date last changed
2016-12-19 15:39:08
@article{157984,
  abstract     = {Motivation: The large amount of available annotated Arabidopsis thaliana sequences allows the induction of splice site prediction models with supervised learning algorithms (see Haussler (1998) for a review and references). These algorithms need information sources or features from which the models can be computed. For splice site prediction, the features we consider in this study are the presence or absence of certain nucleotides in close proximity to the splice site. Since it is not known how many and which nucleotides are relevant for splice site prediction, the set of features is chosen large enough such that the probability that all relevant information sources are in the set is very high. Using only those features that are relevant for constructing a splice site prediction system might improve the system and might also provide us with useful biological knowledge. Using fewer features will of course also improve the prediction speed of the system.
Results: A wrapper-based feature subset selection algorithm using a support vector machine or a naive Bayes prediction method was evaluated against the traditional method for selecting features relevant for splice site prediction. Our results show that this wrapper approach selects features that improve the performance against the use of all features and against the use of the features selected by the traditional method.},
  author       = {Degroeve, Sven and De Baets, Bernard and Van de Peer, Yves and Rouz{\'e}, Pierre},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  keyword      = {SEQUENCES,GENOMIC DNA},
  language     = {eng},
  location     = {Saarbr{\"u}cken, Germany},
  number       = {suppl. 2},
  pages        = {S75--S83},
  title        = {Feature subset selection for splice site prediction},
  url          = {http://dx.doi.org/10.1093/bioinformatics/18.suppl\_2.S75},
  volume       = {18},
  year         = {2002},
}

Chicago
Degroeve, Sven, Bernard De Baets, Yves Van de Peer, and Pierre Rouzé. 2002. “Feature Subset Selection for Splice Site Prediction.” Bioinformatics 18 (suppl. 2): S75–S83.
APA
Degroeve, S., De Baets, B., Van de Peer, Y., & Rouzé, P. (2002). Feature subset selection for splice site prediction. BIOINFORMATICS, 18(suppl. 2), S75–S83. Presented at the European Conference on Computational Biology 2002 (ECCB 2002).
Vancouver
1.
Degroeve S, De Baets B, Van de Peer Y, Rouzé P. Feature subset selection for splice site prediction. BIOINFORMATICS. 2002;18(suppl. 2):S75–S83.
MLA
Degroeve, Sven, Bernard De Baets, Yves Van de Peer, et al. “Feature Subset Selection for Splice Site Prediction.” BIOINFORMATICS 18.suppl. 2 (2002): S75–S83. Print.