Ghent University Academic Bibliography

Advanced

ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles

Thomas Abeel UGent, Yvan Saeys UGent, Pierre Rouzé UGent and Yves Van de Peer UGent (2008) BIOINFORMATICS. 24(13). p.I24-I31
abstract
Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work. Results: Comparing the average structural profile based on base stacking energy of transcribed, promoter and intergenic sequences demonstrates that the core promoter has unique features that cannot be found in other sequences. We show that unsupervised clustering by using self-organizing maps can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences. An implementation of this promoter prediction program, called ProSOM, is available and has been compared with the state-of-the-art. We propose an objective, accurate and biologically sound validation scheme for core promoter predictors. ProSOM performs at least as well as the software currently available, but our technique is more balanced in terms of the number of predicted sites and the number of false predictions, resulting in a better all-round performance. Additional tests on the ENCODE regions of the human genome show that 98 of all predictions made by ProSOM can be associated with transcriptionally active regions, which demonstrates the high precision.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (proceedingsPaper)
publication status
published
subject
keyword
POLYMERASE-II PROMOTERS, TRANSCRIPTION START SITES, FACTOR-BINDING SITES, HUMAN GENOME, DROSOPHILA-MELANOGASTER, COMPUTATIONAL DETECTION, NEURAL-NETWORK, SEQUENCES, IDENTIFICATION, RECOGNITION
journal title
BIOINFORMATICS
Bioinformatics
volume
24
issue
13
pages
I24 - I31
conference name
16th ISMB Conference on Intelligent Systems for Molecular Biology
conference location
Toronto, ON, Canada
conference start
2008-07-19
conference end
2008-07-23
Web of Science type
Proceedings Paper
Web of Science id
000257169700025
JCR category
MATHEMATICAL & COMPUTATIONAL BIOLOGY
JCR impact factor
4.328 (2008)
JCR rank
2/28 (2008)
JCR quartile
1 (2008)
ISSN
1367-4803
DOI
10.1093/bioinformatics/btn172
language
English
UGent publication?
yes
classification
A1
copyright statement
I have retained and own the full copyright for this publication
id
439481
handle
http://hdl.handle.net/1854/LU-439481
date created
2008-11-12 14:06:00
date last changed
2012-12-05 17:42:47
@article{439481,
  abstract     = {Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work. 
Results: Comparing the average structural profile based on base stacking energy of transcribed, promoter and intergenic sequences demonstrates that the core promoter has unique features that cannot be found in other sequences. We show that unsupervised clustering by using self-organizing maps can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences. An implementation of this promoter prediction program, called ProSOM, is available and has been compared with the state-of-the-art. We propose an objective, accurate and biologically sound validation scheme for core promoter predictors. ProSOM performs at least as well as the software currently available, but our technique is more balanced in terms of the number of predicted sites and the number of false predictions, resulting in a better all-round performance. Additional tests on the ENCODE regions of the human genome show that 98 of all predictions made by ProSOM can be associated with transcriptionally active regions, which demonstrates the high precision.},
  author       = {Abeel, Thomas and Saeys, Yvan and Rouz{\'e}, Pierre and Van de Peer, Yves},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  keyword      = {POLYMERASE-II PROMOTERS,TRANSCRIPTION START SITES,FACTOR-BINDING SITES,HUMAN GENOME,DROSOPHILA-MELANOGASTER,COMPUTATIONAL DETECTION,NEURAL-NETWORK,SEQUENCES,IDENTIFICATION,RECOGNITION},
  language     = {eng},
  location     = {Toronto, ON, Canada},
  number       = {13},
  pages        = {I24--I31},
  title        = {ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btn172},
  volume       = {24},
  year         = {2008},
}

Chicago
Abeel, Thomas, Yvan Saeys, Pierre Rouzé, and Yves Van de Peer. 2008. “ProSOM: Core Promoter Prediction Based on Unsupervised Clustering of DNA Physical Profiles.” Bioinformatics 24 (13): I24–I31.
APA
Abeel, T., Saeys, Y., Rouzé, P., & Van de Peer, Y. (2008). ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. BIOINFORMATICS, 24(13), I24–I31. Presented at the 16th ISMB Conference on Intelligent Systems for Molecular Biology.
Vancouver
1.
Abeel T, Saeys Y, Rouzé P, Van de Peer Y. ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. BIOINFORMATICS. 2008;24(13):I24–I31.
MLA
Abeel, Thomas, Yvan Saeys, Pierre Rouzé, et al. “ProSOM: Core Promoter Prediction Based on Unsupervised Clustering of DNA Physical Profiles.” BIOINFORMATICS 24.13 (2008): I24–I31. Print.