Make the most of your samples : Bayes factor estimators for highdimensional models of sequence evolution
 Author
 Guy Baele, Philippe Lemey and Stijn Vansteelandt (UGent)
 Organization
 Project
 Bioinformatics: from nucleotids to networks (N2N)
 Abstract
 Background: Accurate model comparison requires extensive computation times, especially for parameterrich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and steppingstone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results: We here assess the original 'modelswitch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced steppingstone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with contextdependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions: We show that our adaptation of steppingstone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single wellchosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and steppingstone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of highdimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for highdimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.
 Keywords
 PHYLOGENETIC MODELS, PROBABILITY, NONCODING SEQUENCES, MONTECARLO METHOD, NORMALIZING CONSTANTS, SUBSTITUTION PATTERNS, INTEGRATION, INFERENCE, SELECTION
Downloads

147121051485.pdf
 full text
 
 open access
 
 
 779.84 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU4241365
 MLA
 Baele, Guy, Philippe Lemey, and Stijn Vansteelandt. “Make the Most of Your Samples : Bayes Factor Estimators for Highdimensional Models of Sequence Evolution.” BMC BIOINFORMATICS 14 (2013): n. pag. Print.
 APA
 Baele, G., Lemey, P., & Vansteelandt, S. (2013). Make the most of your samples : Bayes factor estimators for highdimensional models of sequence evolution. BMC BIOINFORMATICS, 14.
 Chicago authordate
 Baele, Guy, Philippe Lemey, and Stijn Vansteelandt. 2013. “Make the Most of Your Samples : Bayes Factor Estimators for Highdimensional Models of Sequence Evolution.” Bmc Bioinformatics 14.
 Chicago authordate (all authors)
 Baele, Guy, Philippe Lemey, and Stijn Vansteelandt. 2013. “Make the Most of Your Samples : Bayes Factor Estimators for Highdimensional Models of Sequence Evolution.” Bmc Bioinformatics 14.
 Vancouver
 1.Baele G, Lemey P, Vansteelandt S. Make the most of your samples : Bayes factor estimators for highdimensional models of sequence evolution. BMC BIOINFORMATICS. 2013;14.
 IEEE
 [1]G. Baele, P. Lemey, and S. Vansteelandt, “Make the most of your samples : Bayes factor estimators for highdimensional models of sequence evolution,” BMC BIOINFORMATICS, vol. 14, 2013.
@article{4241365, abstract = {Background: Accurate model comparison requires extensive computation times, especially for parameterrich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and steppingstone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results: We here assess the original 'modelswitch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced steppingstone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with contextdependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions: We show that our adaptation of steppingstone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single wellchosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and steppingstone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of highdimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for highdimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.}, articleno = {85}, author = {Baele, Guy and Lemey, Philippe and Vansteelandt, Stijn}, issn = {14712105}, journal = {BMC BIOINFORMATICS}, keywords = {PHYLOGENETIC MODELS,PROBABILITY,NONCODING SEQUENCES,MONTECARLO METHOD,NORMALIZING CONSTANTS,SUBSTITUTION PATTERNS,INTEGRATION,INFERENCE,SELECTION}, language = {eng}, pages = {18}, title = {Make the most of your samples : Bayes factor estimators for highdimensional models of sequence evolution}, url = {http://dx.doi.org/10.1186/147121051485}, volume = {14}, year = {2013}, }
 Altmetric
 View in Altmetric
 Web of Science
 Times cited: