Improving the performance of machine learning models for biotechnology : the quest for deus ex machina
- Author
- Friederike Mey, Jim Clauwaert (UGent) , Kirsten Van Huffel, Willem Waegeman (UGent) and Marjan De Mey (UGent)
- Organization
- Project
-
- Research Programme Artificial Intelligence - 2021
- Interlocking synthetic biology, systems biology and artificial intelligence to develop a more efficient metabolic engineering workflow: a highly efficient biotechnological production platform for monoclonal chitooligosaccharides.
- Syn/SysBio4COS: Combining synthetic and systems biology to unlock the potential of the hexosamine biosynthesis pathway for chitooligosaccharide production
- Unlocking powerful non-model organisms in microbial synthetic biology - POSSIBL
- BioRoBoost (BioRoBoost - Fostering Synthetic Biology standardisation through international collaboration)
- Synthetic biology and artificial intelligence, mutual learning to advance together and jointly drive industrial biotechnology - accelerating knowledge discovery and strain engineering
- Abstract
- Machine learning is becoming an integral part of the Design-Build-Test-Learn cycle in biotechnology. Machine learning models learn from collected datasets such as omics data and predict a defined outcome, which has led to both production improvements and predictive tools in the field. Robust prediction of the behavior of microbial cell factories and production processes not only greatly increases our understanding of the function of such systems, but also provides significant savings of development time. However, many pitfalls when modeling biological data - bad fit, noisy data, model instability, low data quantity and imbalances in the data - cause models to suffer in their performance. Here we provide an accessible, in-depth analysis on the problems created by these pitfalls, as well as means of their detection and mediation, with a focus on supervised learning. Assessing the state of the art, we show that, currently, in-depth analyses of model performance are often absent and must be improved. This review provides a toolbox for the analysis of model robustness and performance, and simultaneously proposes a standard for the community to facilitate future work. It is further accompanied by an interactive online tutorial on the discussed issues.
- Keywords
- Machine Learning, Biotechnology, Synthetic Biology, Model evaluation, METABOLIC PATHWAYS, SYNTHETIC BIOLOGY, GENE-EXPRESSION, PREDICTION, NOISE, FLUX
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 3.21 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8725887
- MLA
- Mey, Friederike, et al. “Improving the Performance of Machine Learning Models for Biotechnology : The Quest for Deus Ex Machina.” BIOTECHNOLOGY ADVANCES, vol. 53, 2021, doi:10.1016/j.biotechadv.2021.107858.
- APA
- Mey, F., Clauwaert, J., Van Huffel, K., Waegeman, W., & De Mey, M. (2021). Improving the performance of machine learning models for biotechnology : the quest for deus ex machina. BIOTECHNOLOGY ADVANCES, 53. https://doi.org/10.1016/j.biotechadv.2021.107858
- Chicago author-date
- Mey, Friederike, Jim Clauwaert, Kirsten Van Huffel, Willem Waegeman, and Marjan De Mey. 2021. “Improving the Performance of Machine Learning Models for Biotechnology : The Quest for Deus Ex Machina.” BIOTECHNOLOGY ADVANCES 53. https://doi.org/10.1016/j.biotechadv.2021.107858.
- Chicago author-date (all authors)
- Mey, Friederike, Jim Clauwaert, Kirsten Van Huffel, Willem Waegeman, and Marjan De Mey. 2021. “Improving the Performance of Machine Learning Models for Biotechnology : The Quest for Deus Ex Machina.” BIOTECHNOLOGY ADVANCES 53. doi:10.1016/j.biotechadv.2021.107858.
- Vancouver
- 1.Mey F, Clauwaert J, Van Huffel K, Waegeman W, De Mey M. Improving the performance of machine learning models for biotechnology : the quest for deus ex machina. BIOTECHNOLOGY ADVANCES. 2021;53.
- IEEE
- [1]F. Mey, J. Clauwaert, K. Van Huffel, W. Waegeman, and M. De Mey, “Improving the performance of machine learning models for biotechnology : the quest for deus ex machina,” BIOTECHNOLOGY ADVANCES, vol. 53, 2021.
@article{8725887,
abstract = {{Machine learning is becoming an integral part of the Design-Build-Test-Learn cycle in biotechnology. Machine learning models learn from collected datasets such as omics data and predict a defined outcome, which has led to both production improvements and predictive tools in the field. Robust prediction of the behavior of microbial cell factories and production processes not only greatly increases our understanding of the function of such systems, but also provides significant savings of development time. However, many pitfalls when modeling biological data - bad fit, noisy data, model instability, low data quantity and imbalances in the data - cause models to suffer in their performance. Here we provide an accessible, in-depth analysis on the problems created by these pitfalls, as well as means of their detection and mediation, with a focus on supervised learning. Assessing the state of the art, we show that, currently, in-depth analyses of model performance are often absent and must be improved. This review provides a toolbox for the analysis of model robustness and performance, and simultaneously proposes a standard for the community to facilitate future work. It is further accompanied by an interactive online tutorial on the discussed issues.}},
articleno = {{107858}},
author = {{Mey, Friederike and Clauwaert, Jim and Van Huffel, Kirsten and Waegeman, Willem and De Mey, Marjan}},
issn = {{0734-9750}},
journal = {{BIOTECHNOLOGY ADVANCES}},
keywords = {{Machine Learning,Biotechnology,Synthetic Biology,Model evaluation,METABOLIC PATHWAYS,SYNTHETIC BIOLOGY,GENE-EXPRESSION,PREDICTION,NOISE,FLUX}},
language = {{eng}},
pages = {{10}},
title = {{Improving the performance of machine learning models for biotechnology : the quest for deus ex machina}},
url = {{http://doi.org/10.1016/j.biotechadv.2021.107858}},
volume = {{53}},
year = {{2021}},
}
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: