Ghent University Academic Bibliography

Advanced

Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences

Guy Baele, Yves Van de Peer UGent and Stijn Vansteelandt UGent (2009) BMC Evolutionary Biology. 9. p.87.1-87.23
abstract
Background: Many recent studies that relax the assumption of independent evolution of sites have done so at the expense of a drastic increase in the number of substitution parameters. While additional parameters cannot be avoided to model context-dependent evolution, a large increase in model dimensionality is only justified when accompanied with careful model-building strategies that guard against overfitting. An increased dimensionality leads to increases in numerical computations of the models, increased convergence times in Bayesian Markov chain Monte Carlo algorithms and even more tedious Bayes Factor calculations. Results: We have developed two model-search algorithms which reduce the number of Bayes Factor calculations by clustering posterior densities to decide on the equality of substitution behavior in different contexts. The selected model's fit is evaluated using a Bayes Factor, which we calculate via model-switch thermodynamic integration. To reduce computation time and to increase the precision of this integration, we propose to split the calculations over different computers and to appropriately calibrate the individual runs. Using the proposed strategies, we find, in a dataset of primate Ancestral Repeats, that careful modeling of context-dependent evolution may increase model fit considerably and that the combination of a context-dependent model with the assumption of varying rates across sites offers even larger improvements in terms of model fit. Using a smaller nuclear SSU rRNA dataset, we show that context-dependence may only become detectable upon applying model-building strategies. Conclusion: While context-dependent evolutionary models can increase the model fit over traditional independent evolutionary models, such complex models will often contain too many parameters. Justification for the added parameters is thus required so that only those parameters that model evolutionary processes previously unaccounted for are added to the evolutionary model. To obtain an optimal balance between the number of parameters in a context-dependent model and the performance in terms of model fit, we have designed two parameter-reduction strategies and we have shown that model fit can be greatly improved by reducing the number of parameters in a context-dependent evolutionary model.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
CHLOROPLAST GENOME, BAYES FACTORS, PHYLOGENETIC ESTIMATION, DNA-SEQUENCES, MAXIMUM-LIKELIHOOD, SUBSTITUTION RATES, NUCLEOTIDE SUBSTITUTION, EVOLUTION, BIAS, INTEGRATION
journal title
BMC Evolutionary Biology
BMC Evol. Biol.
volume
9
pages
87.1 - 87.23
Web of Science type
Article
Web of Science id
000267757400001
JCR category
GENETICS & HEREDITY
JCR impact factor
4.294 (2009)
JCR rank
29/142 (2009)
JCR quartile
1 (2009)
ISSN
1471-2148
DOI
10.1186/1471-2148-9-87
language
English
UGent publication?
yes
classification
A1
copyright statement
I don't know the status of the copyright for this publication
id
721885
handle
http://hdl.handle.net/1854/LU-721885
date created
2009-08-04 19:48:41
date last changed
2016-12-19 15:46:44
@article{721885,
  abstract     = {Background: Many recent studies that relax the assumption of independent evolution of sites have done so at the expense of a drastic increase in the number of substitution parameters. While additional parameters cannot be avoided to model context-dependent evolution, a large increase in model dimensionality is only justified when accompanied with careful model-building strategies that guard against overfitting. An increased dimensionality leads to increases in numerical computations of the models, increased convergence times in Bayesian Markov chain Monte Carlo algorithms and even more tedious Bayes Factor calculations.

Results: We have developed two model-search algorithms which reduce the number of Bayes Factor calculations by clustering posterior densities to decide on the equality of substitution behavior in different contexts. The selected model's fit is evaluated using a Bayes Factor, which we calculate via model-switch thermodynamic integration. To reduce computation time and to increase the precision of this integration, we propose to split the calculations over different computers and to appropriately calibrate the individual runs. Using the proposed strategies, we find, in a dataset of primate Ancestral Repeats, that careful modeling of context-dependent evolution may increase model fit considerably and that the combination of a context-dependent model with the assumption of varying rates across sites offers even larger improvements in terms of model fit. Using a smaller nuclear SSU rRNA dataset, we show that context-dependence may only become detectable upon applying model-building strategies.

Conclusion: While context-dependent evolutionary models can increase the model fit over traditional independent evolutionary models, such complex models will often contain too many parameters. Justification for the added parameters is thus required so that only those parameters that model evolutionary processes previously unaccounted for are added to the evolutionary model. To obtain an optimal balance between the number of parameters in a context-dependent model and the performance in terms of model fit, we have designed two parameter-reduction strategies and we have shown that model fit can be greatly improved by reducing the number of parameters in a context-dependent evolutionary model.},
  author       = {Baele, Guy and Van de Peer, Yves and Vansteelandt, Stijn},
  issn         = {1471-2148},
  journal      = {BMC Evolutionary Biology},
  keyword      = {CHLOROPLAST GENOME,BAYES FACTORS,PHYLOGENETIC ESTIMATION,DNA-SEQUENCES,MAXIMUM-LIKELIHOOD,SUBSTITUTION RATES,NUCLEOTIDE SUBSTITUTION,EVOLUTION,BIAS,INTEGRATION},
  language     = {eng},
  pages        = {87.1--87.23},
  title        = {Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences},
  url          = {http://dx.doi.org/10.1186/1471-2148-9-87},
  volume       = {9},
  year         = {2009},
}

Chicago
Baele, Guy, Yves Van de Peer, and Stijn Vansteelandt. 2009. “Efficient Context-dependent Model Building Based on Clustering Posterior Distributions for Non-coding Sequences.” BMC Evolutionary Biology 9: 87.1–87.23.
APA
Baele, Guy, Van de Peer, Y., & Vansteelandt, S. (2009). Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences. BMC Evolutionary Biology, 9, 87.1–87.23.
Vancouver
1.
Baele G, Van de Peer Y, Vansteelandt S. Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences. BMC Evolutionary Biology. 2009;9:87.1–87.23.
MLA
Baele, Guy, Yves Van de Peer, and Stijn Vansteelandt. “Efficient Context-dependent Model Building Based on Clustering Posterior Distributions for Non-coding Sequences.” BMC Evolutionary Biology 9 (2009): 87.1–87.23. Print.