Ghent University Academic Bibliography

Advanced

Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences

Guy Baele, Yves Van de Peer UGent and Stijn Vansteelandt UGent (2010) BMC EVOLUTIONARY BIOLOGY. 10.
abstract
Background: Recent approaches for context-dependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different orders of Markov chains to model dependence at the ancestral root sequence. Root distributions which are coupled to the context-dependent model across the underlying phylogenetic tree are deemed more realistic than decoupled Markov chains models, as the evolutionary process is responsible for shaping the composition of the ancestral root sequence. Results: We find strong support, in terms of Bayes Factors, for using a second-order Markov chain at the ancestral root sequence along with a context-dependent model throughout the remainder of the phylogenetic tree in an ancestral repeats dataset, and for using a first-order Markov chain at the ancestral root sequence in a pseudogene dataset. Relaxing the assumption of a single context-independent set of independent model frequencies as presented in previous work, yields a further drastic increase in model fit. We show that the substitution rates associated with the CpG-methylation-deamination process can be modelled through context-dependent model frequencies and that their accuracy depends on the (order of the) Markov chain imposed at the ancestral root sequence. In addition, we provide evidence that this approach (which assumes that root distribution and evolutionary model are decoupled) outperforms an approach inspired by the work of Arndt et al., where the root distribution is coupled to the evolutionary model. We show that the continuous-time approximation of Hwang and Green has stronger support in terms of Bayes Factors, but the parameter estimates show minimal differences. Conclusions: We show that the combination of a dependency scheme at the ancestral root sequence and a context-dependent evolutionary model across the remainder of the tree allows for accurate estimation of the model's parameters. The different assumptions tested in this manuscript clearly show that designing accurate context-dependent models is a complex process, with many different assumptions that require validation. Further, these assumptions are shown to change across different datasets, making the search for an adequate model for a given dataset quite challenging.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
BIAS, RATES, CHLOROPLAST GENOME, MAMMALIAN EVOLUTION, BAYES FACTORS, SUBSTITUTION PATTERNS, POSTERIOR DISTRIBUTIONS, MUTATION-SELECTION MODELS, NEIGHBORING BASE COMPOSITION, PHYLOGENETIC MODELS
journal title
BMC EVOLUTIONARY BIOLOGY
BMC Evol. Biol.
volume
10
article number
244
pages
21 pages
Web of Science type
Article
Web of Science id
000282767200001
JCR category
GENETICS & HEREDITY
JCR impact factor
3.702 (2010)
JCR rank
49/154 (2010)
JCR quartile
2 (2010)
ISSN
1471-2148
DOI
10.1186/1471-2148-10-244
language
English
UGent publication?
yes
classification
A1
copyright statement
I have retained and own the full copyright for this publication
id
1070125
handle
http://hdl.handle.net/1854/LU-1070125
date created
2010-11-04 15:07:21
date last changed
2016-12-21 15:42:41
@article{1070125,
  abstract     = {Background: Recent approaches for context-dependent evolutionary modelling assume that the evolution of a given site depends upon its ancestor and that ancestor's immediate flanking sites. Because such dependency pattern cannot be imposed on the root sequence, we consider the use of different orders of Markov chains to model dependence at the ancestral root sequence. Root distributions which are coupled to the context-dependent model across the underlying phylogenetic tree are deemed more realistic than decoupled Markov chains models, as the evolutionary process is responsible for shaping the composition of the ancestral root sequence.
Results: We find strong support, in terms of Bayes Factors, for using a second-order Markov chain at the ancestral root sequence along with a context-dependent model throughout the remainder of the phylogenetic tree in an ancestral repeats dataset, and for using a first-order Markov chain at the ancestral root sequence in a pseudogene dataset. Relaxing the assumption of a single context-independent set of independent model frequencies as presented in previous work, yields a further drastic increase in model fit. We show that the substitution rates associated with the CpG-methylation-deamination process can be modelled through context-dependent model frequencies and that their accuracy depends on the (order of the) Markov chain imposed at the ancestral root sequence. In addition, we provide evidence that this approach (which assumes that root distribution and evolutionary model are decoupled) outperforms an approach inspired by the work of Arndt et al., where the root distribution is coupled to the evolutionary model. We show that the continuous-time approximation of Hwang and Green has stronger support in terms of Bayes Factors, but the parameter estimates show minimal differences.
Conclusions: We show that the combination of a dependency scheme at the ancestral root sequence and a context-dependent evolutionary model across the remainder of the tree allows for accurate estimation of the model's parameters. The different assumptions tested in this manuscript clearly show that designing accurate context-dependent models is a complex process, with many different assumptions that require validation. Further, these assumptions are shown to change across different datasets, making the search for an adequate model for a given dataset quite challenging.},
  articleno    = {244},
  author       = {Baele, Guy and Van de Peer, Yves and Vansteelandt, Stijn},
  issn         = {1471-2148},
  journal      = {BMC EVOLUTIONARY BIOLOGY},
  keyword      = {BIAS,RATES,CHLOROPLAST GENOME,MAMMALIAN EVOLUTION,BAYES FACTORS,SUBSTITUTION PATTERNS,POSTERIOR DISTRIBUTIONS,MUTATION-SELECTION MODELS,NEIGHBORING BASE COMPOSITION,PHYLOGENETIC MODELS},
  language     = {eng},
  pages        = {21},
  title        = {Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences},
  url          = {http://dx.doi.org/10.1186/1471-2148-10-244},
  volume       = {10},
  year         = {2010},
}

Chicago
Baele, Guy, Yves Van de Peer, and Stijn Vansteelandt. 2010. “Modelling the Ancestral Sequence Distribution and Model Frequencies in Context-dependent Models for Primate Non-coding Sequences.” Bmc Evolutionary Biology 10.
APA
Baele, Guy, Van de Peer, Y., & Vansteelandt, S. (2010). Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences. BMC EVOLUTIONARY BIOLOGY, 10.
Vancouver
1.
Baele G, Van de Peer Y, Vansteelandt S. Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences. BMC EVOLUTIONARY BIOLOGY. 2010;10.
MLA
Baele, Guy, Yves Van de Peer, and Stijn Vansteelandt. “Modelling the Ancestral Sequence Distribution and Model Frequencies in Context-dependent Models for Primate Non-coding Sequences.” BMC EVOLUTIONARY BIOLOGY 10 (2010): n. pag. Print.