Ghent University Academic Bibliography

Advanced

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Bart Hooghe UGent, Stefan Broos UGent, Frans Van Roy UGent and Pieter De Bleser UGent (2012) NUCLEIC ACIDS RESEARCH. 40(14).
abstract
Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
TATA BOX, B-DNA, IN-VITRO, HUMAN GENOME, TARGET SITES, STRUCTURAL-ANALYSIS, MOLECULAR-DYNAMICS SIMULATIONS, ESCHERICHIA-COLI, PROTEIN-DNA RECOGNITION, SEQUENCE
journal title
NUCLEIC ACIDS RESEARCH
Nucleic Acids Res.
volume
40
issue
14
article_number
e106
pages
15 pages
Web of Science type
Article
Web of Science id
000307504700003
JCR category
BIOCHEMISTRY & MOLECULAR BIOLOGY
JCR impact factor
8.278 (2012)
JCR rank
27/288 (2012)
JCR quartile
1 (2012)
ISSN
0305-1048
DOI
10.1093/nar/gks283
language
English
UGent publication?
yes
classification
A1
copyright statement
I have transferred the copyright for this publication to the publisher
id
3009502
handle
http://hdl.handle.net/1854/LU-3009502
date created
2012-10-09 10:21:28
date last changed
2012-10-11 11:55:30
@article{3009502,
  abstract     = {Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.},
  articleno    = {e106},
  author       = {Hooghe, Bart and Broos, Stefan and Van Roy, Frans and De Bleser, Pieter},
  issn         = {0305-1048},
  journal      = {NUCLEIC ACIDS RESEARCH},
  keyword      = {TATA BOX,B-DNA,IN-VITRO,HUMAN GENOME,TARGET SITES,STRUCTURAL-ANALYSIS,MOLECULAR-DYNAMICS SIMULATIONS,ESCHERICHIA-COLI,PROTEIN-DNA RECOGNITION,SEQUENCE},
  language     = {eng},
  number       = {14},
  pages        = {15},
  title        = {A flexible integrative approach based on random forest improves prediction of transcription factor binding sites},
  url          = {http://dx.doi.org/10.1093/nar/gks283},
  volume       = {40},
  year         = {2012},
}

Chicago
Hooghe, Bart, Stefan Broos, Frans Van Roy, and Pieter De Bleser. 2012. “A Flexible Integrative Approach Based on Random Forest Improves Prediction of Transcription Factor Binding Sites.” Nucleic Acids Research 40 (14).
APA
Hooghe, B., Broos, S., Van Roy, F., & De Bleser, P. (2012). A flexible integrative approach based on random forest improves prediction of transcription factor binding sites. NUCLEIC ACIDS RESEARCH, 40(14).
Vancouver
1.
Hooghe B, Broos S, Van Roy F, De Bleser P. A flexible integrative approach based on random forest improves prediction of transcription factor binding sites. NUCLEIC ACIDS RESEARCH. 2012;40(14).
MLA
Hooghe, Bart, Stefan Broos, Frans Van Roy, et al. “A Flexible Integrative Approach Based on Random Forest Improves Prediction of Transcription Factor Binding Sites.” NUCLEIC ACIDS RESEARCH 40.14 (2012): n. pag. Print.