Benchmarking of cell type deconvolution pipelines for transcriptomics data
- Author
- Francisco Avila Cobos (UGent) , José Alquicira-Hernandez, Joseph E. Powell, Pieter Mestdagh (UGent) and Katleen De Preter (UGent)
- Organization
- Abstract
- Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.
- Keywords
- computational deconvolution, transcriptomics, NORMALIZATION, SIGNATURES
Downloads
-
s41467-020-19015-1.pdf
- full text (Published version)
- |
- open access
- |
- |
- 1.72 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8680393
- MLA
- Avila Cobos, Francisco, et al. “Benchmarking of Cell Type Deconvolution Pipelines for Transcriptomics Data.” NATURE COMMUNICATIONS, vol. 11, no. 1, 2020, doi:10.1038/s41467-020-19015-1.
- APA
- Avila Cobos, F., Alquicira-Hernandez, J., Powell, J. E., Mestdagh, P., & De Preter, K. (2020). Benchmarking of cell type deconvolution pipelines for transcriptomics data. NATURE COMMUNICATIONS, 11(1). https://doi.org/10.1038/s41467-020-19015-1
- Chicago author-date
- Avila Cobos, Francisco, José Alquicira-Hernandez, Joseph E. Powell, Pieter Mestdagh, and Katleen De Preter. 2020. “Benchmarking of Cell Type Deconvolution Pipelines for Transcriptomics Data.” NATURE COMMUNICATIONS 11 (1). https://doi.org/10.1038/s41467-020-19015-1.
- Chicago author-date (all authors)
- Avila Cobos, Francisco, José Alquicira-Hernandez, Joseph E. Powell, Pieter Mestdagh, and Katleen De Preter. 2020. “Benchmarking of Cell Type Deconvolution Pipelines for Transcriptomics Data.” NATURE COMMUNICATIONS 11 (1). doi:10.1038/s41467-020-19015-1.
- Vancouver
- 1.Avila Cobos F, Alquicira-Hernandez J, Powell JE, Mestdagh P, De Preter K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. NATURE COMMUNICATIONS. 2020;11(1).
- IEEE
- [1]F. Avila Cobos, J. Alquicira-Hernandez, J. E. Powell, P. Mestdagh, and K. De Preter, “Benchmarking of cell type deconvolution pipelines for transcriptomics data,” NATURE COMMUNICATIONS, vol. 11, no. 1, 2020.
@article{8680393, abstract = {{Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.}}, articleno = {{5650}}, author = {{Avila Cobos, Francisco and Alquicira-Hernandez, José and Powell, Joseph E. and Mestdagh, Pieter and De Preter, Katleen}}, issn = {{2041-1723}}, journal = {{NATURE COMMUNICATIONS}}, keywords = {{computational deconvolution,transcriptomics,NORMALIZATION,SIGNATURES}}, language = {{eng}}, number = {{1}}, pages = {{14}}, title = {{Benchmarking of cell type deconvolution pipelines for transcriptomics data}}, url = {{http://doi.org/10.1038/s41467-020-19015-1}}, volume = {{11}}, year = {{2020}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: