Improving the performance of machine learning models for biotechnology : the quest for deus ex machina
- Author
- Friederike Mey, Jim Clauwaert (UGent) , Kirsten Van Huffel, Willem Waegeman (UGent) and Marjan De Mey (UGent)
- Organization
- Project
- Abstract
- Machine learning is becoming an integral part of the Design-Build-Test-Learn cycle in biotechnology. Machine learning models learn from collected datasets such as omics data and predict a defined outcome, which has led to both production improvements and predictive tools in the field. Robust prediction of the behavior of microbial cell factories and production processes not only greatly increases our understanding of the function of such systems, but also provides significant savings of development time. However, many pitfalls when modeling biological data - bad fit, noisy data, model instability, low data quantity and imbalances in the data - cause models to suffer in their performance. Here we provide an accessible, in-depth analysis on the problems created by these pitfalls, as well as means of their detection and mediation, with a focus on supervised learning. Assessing the state of the art, we show that, currently, in-depth analyses of model performance are often absent and must be improved. This review provides a toolbox for the analysis of model robustness and performance, and simultaneously proposes a standard for the community to facilitate future work. It is further accompanied by an interactive online tutorial on the discussed issues.
- Keywords
- Machine Learning, Biotechnology, Synthetic Biology, Model evaluation, METABOLIC PATHWAYS, SYNTHETIC BIOLOGY, GENE-EXPRESSION, PREDICTION, NOISE, FLUX
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 3.21 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8725887
- MLA
- Mey, Friederike, et al. “Improving the Performance of Machine Learning Models for Biotechnology : The Quest for Deus Ex Machina.” BIOTECHNOLOGY ADVANCES, vol. 53, 2021, doi:10.1016/j.biotechadv.2021.107858.
- APA
- Mey, F., Clauwaert, J., Van Huffel, K., Waegeman, W., & De Mey, M. (2021). Improving the performance of machine learning models for biotechnology : the quest for deus ex machina. BIOTECHNOLOGY ADVANCES, 53. https://doi.org/10.1016/j.biotechadv.2021.107858
- Chicago author-date
- Mey, Friederike, Jim Clauwaert, Kirsten Van Huffel, Willem Waegeman, and Marjan De Mey. 2021. “Improving the Performance of Machine Learning Models for Biotechnology : The Quest for Deus Ex Machina.” BIOTECHNOLOGY ADVANCES 53. https://doi.org/10.1016/j.biotechadv.2021.107858.
- Chicago author-date (all authors)
- Mey, Friederike, Jim Clauwaert, Kirsten Van Huffel, Willem Waegeman, and Marjan De Mey. 2021. “Improving the Performance of Machine Learning Models for Biotechnology : The Quest for Deus Ex Machina.” BIOTECHNOLOGY ADVANCES 53. doi:10.1016/j.biotechadv.2021.107858.
- Vancouver
- 1.Mey F, Clauwaert J, Van Huffel K, Waegeman W, De Mey M. Improving the performance of machine learning models for biotechnology : the quest for deus ex machina. BIOTECHNOLOGY ADVANCES. 2021;53.
- IEEE
- [1]F. Mey, J. Clauwaert, K. Van Huffel, W. Waegeman, and M. De Mey, “Improving the performance of machine learning models for biotechnology : the quest for deus ex machina,” BIOTECHNOLOGY ADVANCES, vol. 53, 2021.
@article{8725887, abstract = {{Machine learning is becoming an integral part of the Design-Build-Test-Learn cycle in biotechnology. Machine learning models learn from collected datasets such as omics data and predict a defined outcome, which has led to both production improvements and predictive tools in the field. Robust prediction of the behavior of microbial cell factories and production processes not only greatly increases our understanding of the function of such systems, but also provides significant savings of development time. However, many pitfalls when modeling biological data - bad fit, noisy data, model instability, low data quantity and imbalances in the data - cause models to suffer in their performance. Here we provide an accessible, in-depth analysis on the problems created by these pitfalls, as well as means of their detection and mediation, with a focus on supervised learning. Assessing the state of the art, we show that, currently, in-depth analyses of model performance are often absent and must be improved. This review provides a toolbox for the analysis of model robustness and performance, and simultaneously proposes a standard for the community to facilitate future work. It is further accompanied by an interactive online tutorial on the discussed issues.}}, articleno = {{107858}}, author = {{Mey, Friederike and Clauwaert, Jim and Van Huffel, Kirsten and Waegeman, Willem and De Mey, Marjan}}, issn = {{0734-9750}}, journal = {{BIOTECHNOLOGY ADVANCES}}, keywords = {{Machine Learning,Biotechnology,Synthetic Biology,Model evaluation,METABOLIC PATHWAYS,SYNTHETIC BIOLOGY,GENE-EXPRESSION,PREDICTION,NOISE,FLUX}}, language = {{eng}}, pages = {{10}}, title = {{Improving the performance of machine learning models for biotechnology : the quest for deus ex machina}}, url = {{http://doi.org/10.1016/j.biotechadv.2021.107858}}, volume = {{53}}, year = {{2021}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: