Ghent University Academic Bibliography

Advanced

Analysis of a Gibbs sampler method for model-based clustering of gene expression data

Anagha Madhusudan Joshi UGent, Yves Van de Peer UGent and Tom Michoel UGent (2008) BIOINFORMATICS. 24(2). p.176-183
abstract
Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
CELL-CYCLE, LEARNING ALGORITHMS, PROFILES, PATTERNS
journal title
BIOINFORMATICS
Bioinformatics
volume
24
issue
2
pages
176 - 183
Web of Science type
Article
Web of Science id
000252498500005
JCR category
MATHEMATICAL & COMPUTATIONAL BIOLOGY
JCR impact factor
4.328 (2008)
JCR rank
2/28 (2008)
JCR quartile
1 (2008)
ISSN
1367-4803
DOI
10.1093/bioinformatics/btm562
language
English
UGent publication?
yes
classification
A1
copyright statement
I have transferred the copyright for this publication to the publisher
id
439462
handle
http://hdl.handle.net/1854/LU-439462
date created
2008-11-12 13:55:00
date last changed
2012-12-05 17:14:18
@article{439462,
  abstract     = {Motivation: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. 
Results: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.},
  author       = {Joshi, Anagha Madhusudan and Van de Peer, Yves and Michoel, Tom},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  keyword      = {CELL-CYCLE,LEARNING ALGORITHMS,PROFILES,PATTERNS},
  language     = {eng},
  number       = {2},
  pages        = {176--183},
  title        = {Analysis of a Gibbs sampler method for model-based clustering of gene expression data},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btm562},
  volume       = {24},
  year         = {2008},
}

Chicago
Joshi, Anagha Madhusudan, Yves Van de Peer, and Tom Michoel. 2008. “Analysis of a Gibbs Sampler Method for Model-based Clustering of Gene Expression Data.” Bioinformatics 24 (2): 176–183.
APA
Joshi, A. M., Van de Peer, Y., & Michoel, T. (2008). Analysis of a Gibbs sampler method for model-based clustering of gene expression data. BIOINFORMATICS, 24(2), 176–183.
Vancouver
1.
Joshi AM, Van de Peer Y, Michoel T. Analysis of a Gibbs sampler method for model-based clustering of gene expression data. BIOINFORMATICS. 2008;24(2):176–83.
MLA
Joshi, Anagha Madhusudan, Yves Van de Peer, and Tom Michoel. “Analysis of a Gibbs Sampler Method for Model-based Clustering of Gene Expression Data.” BIOINFORMATICS 24.2 (2008): 176–183. Print.