Ghent University Academic Bibliography

Advanced

Validating module network learning algorithms using simulated data

Tom Michoel UGent, Steven Maere UGent, Eric Bonnet UGent, Anagha Madhusudan Joshi UGent, Yvan Saeys UGent, Tim Van den Bulcke, Koenraad Van Leemput, Piet Van Remortel, Martin Kuiper and Kathleen Marchal UGent, et al. (2007) BMC BIOINFORMATICS. 8(suppl. 2).
abstract
Background: In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further developments in the area are hampered by a lack of tools to compare the performance of alternative module network learning strategies. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Results: Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators. Conclusion: We show that data simulators such as SynTReN are very well suited for the purpose of developing, testing and improving module network algorithms. We used SynTReN data to develop and test an alternative module network learning strategy, which is incorporated in the software package LeMoNe, and we provide evidence that this alternative strategy has several advantages with respect to existing methods.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (proceedingsPaper)
publication status
published
subject
keyword
DISCOVERY, ONTOLOGY, GENOME, GENE-EXPRESSION DATA, REGULATORY NETWORKS, BIOLOGY, CELLS
journal title
BMC BIOINFORMATICS
BMC Bioinformatics
volume
8
issue
suppl. 2
article_number
S5
pages
15 pages
Web of Science type
Article; Proceedings Paper
Web of Science id
000246602300005
JCR category
MATHEMATICAL & COMPUTATIONAL BIOLOGY
JCR impact factor
3.493 (2007)
JCR rank
3/24 (2007)
JCR quartile
1 (2007)
ISSN
1471-2105
DOI
10.1186/1471-2105-8-S2-S5
language
English
UGent publication?
yes
classification
A1
copyright statement
I have retained and own the full copyright for this publication
id
410092
handle
http://hdl.handle.net/1854/LU-410092
date created
2008-05-16 14:38:00
date last changed
2013-09-16 15:41:11
@article{410092,
  abstract     = {Background: In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further developments in the area are hampered by a lack of tools to compare the performance of alternative module network learning strategies. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. 
Results: Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators. 
Conclusion: We show that data simulators such as SynTReN are very well suited for the purpose of developing, testing and improving module network algorithms. We used SynTReN data to develop and test an alternative module network learning strategy, which is incorporated in the software package LeMoNe, and we provide evidence that this alternative strategy has several advantages with respect to existing methods.},
  articleno    = {S5},
  author       = {Michoel, Tom and Maere, Steven and Bonnet, Eric and Joshi, Anagha Madhusudan and Saeys, Yvan and Van den Bulcke, Tim and Van Leemput, Koenraad and Van Remortel, Piet and Kuiper, Martin and Marchal, Kathleen and Van de Peer, Yves},
  issn         = {1471-2105},
  journal      = {BMC BIOINFORMATICS},
  keyword      = {DISCOVERY,ONTOLOGY,GENOME,GENE-EXPRESSION DATA,REGULATORY NETWORKS,BIOLOGY,CELLS},
  language     = {eng},
  number       = {suppl. 2},
  pages        = {15},
  title        = {Validating module network learning algorithms using simulated data},
  url          = {http://dx.doi.org/10.1186/1471-2105-8-S2-S5},
  volume       = {8},
  year         = {2007},
}

Chicago
Michoel, Tom, Steven Maere, Eric Bonnet, Anagha Madhusudan Joshi, Yvan Saeys, Tim Van den Bulcke, Koenraad Van Leemput, et al. 2007. “Validating Module Network Learning Algorithms Using Simulated Data.” Bmc Bioinformatics 8 (suppl. 2).
APA
Michoel, T., Maere, S., Bonnet, E., Joshi, A. M., Saeys, Y., Van den Bulcke, T., Van Leemput, K., et al. (2007). Validating module network learning algorithms using simulated data. BMC BIOINFORMATICS, 8(suppl. 2).
Vancouver
1.
Michoel T, Maere S, Bonnet E, Joshi AM, Saeys Y, Van den Bulcke T, et al. Validating module network learning algorithms using simulated data. BMC BIOINFORMATICS. 2007;8(suppl. 2).
MLA
Michoel, Tom, Steven Maere, Eric Bonnet, et al. “Validating Module Network Learning Algorithms Using Simulated Data.” BMC BIOINFORMATICS 8.suppl. 2 (2007): n. pag. Print.