Advanced search
1 file | 384.69 KB

Analysis of a Gibbs sampler method for model-based clustering of gene expression data

(2008) BIOINFORMATICS. 24(2). p.176-183
Author
Organization
Abstract
Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.
Keywords
CELL-CYCLE, LEARNING ALGORITHMS, PROFILES, PATTERNS

Downloads

    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 384.69 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Joshi, Anagha Madhusudan, Yves Van de Peer, and Tom Michoel. 2008. “Analysis of a Gibbs Sampler Method for Model-based Clustering of Gene Expression Data.” Bioinformatics 24 (2): 176–183.
APA
Joshi, A. M., Van de Peer, Y., & Michoel, T. (2008). Analysis of a Gibbs sampler method for model-based clustering of gene expression data. BIOINFORMATICS, 24(2), 176–183.
Vancouver
1.
Joshi AM, Van de Peer Y, Michoel T. Analysis of a Gibbs sampler method for model-based clustering of gene expression data. BIOINFORMATICS. 2008;24(2):176–83.
MLA
Joshi, Anagha Madhusudan, Yves Van de Peer, and Tom Michoel. “Analysis of a Gibbs Sampler Method for Model-based Clustering of Gene Expression Data.” BIOINFORMATICS 24.2 (2008): 176–183. Print.
@article{439462,
  abstract     = {Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. 
Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.},
  author       = {Joshi, Anagha Madhusudan and Van de Peer, Yves and Michoel, Tom},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  keyword      = {CELL-CYCLE,LEARNING ALGORITHMS,PROFILES,PATTERNS},
  language     = {eng},
  number       = {2},
  pages        = {176--183},
  title        = {Analysis of a Gibbs sampler method for model-based clustering of gene expression data},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btm562},
  volume       = {24},
  year         = {2008},
}

Altmetric
View in Altmetric
Web of Science
Times cited: