Advanced search
1 file | 226.28 KB

A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences

Guy Baele (UGent) , Yves Van de Peer (UGent) and Stijn Vansteelandt (UGent)
(2008) SYSTEMATIC BIOLOGY. 57(5). p.675-692
Author
Organization
Abstract
In this article, we present a likelihood-based framework for modeling site dependencies. Our approach builds upon standard evolutionary models but incorporates site dependencies across the entire tree by letting the evolutionary parameters in these models depend upon the ancestral states at the neighboring sites. It thus avoids the need for introducing new and high-dimensional evolutionary models for site-dependent evolution. We propose a Markov chain Monte Carlo approach with data augmentation to infer the evolutionary parameters under our model. Although our approach allows for wide-ranging site dependencies, we illustrate its use, in two non-coding datasets, in the case of nearest-neighbor dependencies (i.e., evolution directly depending only upon the immediate flanking sites). The results reveal that the general time-reversible model with nearest-neighbor dependencies substantially improves the fit to the data as compared to the corresponding model with site independence. Using the parameter estimates from our model, we elaborate on the importance of the 5-methylcytosine deamination process (i.e., the CpG effect) and show that this process also depends upon the 5' neighboring base identity. We hint at the possibility of a so-called TpA effect and show that the observed substitution behavior is very complex in the light of dinucleotide estimates. We also discuss the presence of CpG effects in a nuclear small subunit dataset and find significant evidence that evolutionary models incorporating context-dependent effects perform substantially better than independent-site models and in some cases even outperform models that incorporate varying rates across sites.
Keywords
context effect, Bayes factor, context-dependent evolution, CpG effect, likelihood function, Markov chain Monte Carlo, nearest-neighbor influences, thermodynamic integration, CHAIN MONTE-CARLO, BAYESIAN PHYLOGENETIC INFERENCE, DNA-SEQUENCES, NUCLEOTIDE SUBSTITUTION, MAXIMUM-LIKELIHOOD, EVOLUTIONARY TREES, CHLOROPLAST GENOME, TERTIARY STRUCTURE, MITOCHONDRIAL-DNA, PROTEIN EVOLUTION

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 226.28 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Baele, Guy, Yves Van de Peer, and Stijn Vansteelandt. 2008. “A Model-based Approach to Study Nearest-neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences.” Systematic Biology 57 (5): 675–692.
APA
Baele, Guy, Van de Peer, Y., & Vansteelandt, S. (2008). A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. SYSTEMATIC BIOLOGY, 57(5), 675–692.
Vancouver
1.
Baele G, Van de Peer Y, Vansteelandt S. A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. SYSTEMATIC BIOLOGY. 2008;57(5):675–92.
MLA
Baele, Guy, Yves Van de Peer, and Stijn Vansteelandt. “A Model-based Approach to Study Nearest-neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences.” SYSTEMATIC BIOLOGY 57.5 (2008): 675–692. Print.
@article{440131,
  abstract     = {In this article, we present a likelihood-based framework for modeling site dependencies. Our approach builds upon standard evolutionary models but incorporates site dependencies across the entire tree by letting the evolutionary parameters in these models depend upon the ancestral states at the neighboring sites. It thus avoids the need for introducing new and high-dimensional evolutionary models for site-dependent evolution. We propose a Markov chain Monte Carlo approach with data augmentation to infer the evolutionary parameters under our model. Although our approach allows for wide-ranging site dependencies, we illustrate its use, in two non-coding datasets, in the case of nearest-neighbor dependencies (i.e., evolution directly depending only upon the immediate flanking sites). The results reveal that the general time-reversible model with nearest-neighbor dependencies substantially improves the fit to the data as compared to the corresponding model with site independence. Using the parameter estimates from our model, we elaborate on the importance of the 5-methylcytosine deamination process (i.e., the CpG effect) and show that this process also depends upon the 5' neighboring base identity. We hint at the possibility of a so-called TpA effect and show that the observed substitution behavior is very complex in the light of dinucleotide estimates. We also discuss the presence of CpG effects in a nuclear small subunit dataset and find significant evidence that evolutionary models incorporating context-dependent effects perform substantially better than independent-site models and in some cases even outperform models that incorporate varying rates across sites.},
  author       = {Baele, Guy and Van de Peer, Yves and Vansteelandt, Stijn},
  issn         = {1063-5157},
  journal      = {SYSTEMATIC BIOLOGY},
  language     = {eng},
  number       = {5},
  pages        = {675--692},
  title        = {A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences},
  url          = {http://dx.doi.org/10.1080/10635150802422324},
  volume       = {57},
  year         = {2008},
}

Altmetric
View in Altmetric
Web of Science
Times cited: