
Analysis of a Gibbs sampler method for model-based clustering of gene expression data
- Author
- Anagha Madhusudan Joshi (UGent) , Yves Van de Peer (UGent) and Tom Michoel (UGent)
- Organization
- Abstract
- Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.
- Keywords
- CELL-CYCLE, LEARNING ALGORITHMS, PROFILES, PATTERNS
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 384.69 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-439462
- MLA
- Joshi, Anagha Madhusudan, et al. “Analysis of a Gibbs Sampler Method for Model-Based Clustering of Gene Expression Data.” BIOINFORMATICS, vol. 24, no. 2, 2008, pp. 176–83, doi:10.1093/bioinformatics/btm562.
- APA
- Joshi, A. M., Van de Peer, Y., & Michoel, T. (2008). Analysis of a Gibbs sampler method for model-based clustering of gene expression data. BIOINFORMATICS, 24(2), 176–183. https://doi.org/10.1093/bioinformatics/btm562
- Chicago author-date
- Joshi, Anagha Madhusudan, Yves Van de Peer, and Tom Michoel. 2008. “Analysis of a Gibbs Sampler Method for Model-Based Clustering of Gene Expression Data.” BIOINFORMATICS 24 (2): 176–83. https://doi.org/10.1093/bioinformatics/btm562.
- Chicago author-date (all authors)
- Joshi, Anagha Madhusudan, Yves Van de Peer, and Tom Michoel. 2008. “Analysis of a Gibbs Sampler Method for Model-Based Clustering of Gene Expression Data.” BIOINFORMATICS 24 (2): 176–183. doi:10.1093/bioinformatics/btm562.
- Vancouver
- 1.Joshi AM, Van de Peer Y, Michoel T. Analysis of a Gibbs sampler method for model-based clustering of gene expression data. BIOINFORMATICS. 2008;24(2):176–83.
- IEEE
- [1]A. M. Joshi, Y. Van de Peer, and T. Michoel, “Analysis of a Gibbs sampler method for model-based clustering of gene expression data,” BIOINFORMATICS, vol. 24, no. 2, pp. 176–183, 2008.
@article{439462, abstract = {{Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.}}, author = {{Joshi, Anagha Madhusudan and Van de Peer, Yves and Michoel, Tom}}, issn = {{1367-4803}}, journal = {{BIOINFORMATICS}}, keywords = {{CELL-CYCLE,LEARNING ALGORITHMS,PROFILES,PATTERNS}}, language = {{eng}}, number = {{2}}, pages = {{176--183}}, title = {{Analysis of a Gibbs sampler method for model-based clustering of gene expression data}}, url = {{http://doi.org/10.1093/bioinformatics/btm562}}, volume = {{24}}, year = {{2008}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: