Advanced search
1 file | 456.27 KB

Peptide-level robust ridge regression improves estimation, sensitivity, and specificity in data-dependent quantitative label-free shotgun proteomics

Ludger Goeminne (UGent) , Kris Gevaert (UGent) and Lieven Clement (UGent)
Author
Organization
Project
Bioinformatics: from nucleotids to networks (N2N)
Abstract
Peptide intensities from mass spectra are increasingly used for relative quantitation of proteins in complex samples. However, numerous issues inherent to the mass spectrometry workflow turn quantitative proteomic data analysis into a crucial challenge. We and others have shown that modeling at the peptide level outperforms classical summarization-based approaches, which typically also discard a lot of proteins at the data preprocessing step. Peptide-based linear regression models, however, still suffer from unbalanced datasets due to missing peptide intensities, outlying peptide intensities and overfitting. Here, we further improve upon peptide-based models by three modular extensions: ridge regression, improved variance estimation by borrowing information across proteins with empirical Bayes and M-estimation with Huber weights. We illustrate our method on the CPTAC spike-in study and on a study comparing wild-type and ArgP knock-out Francisella tularensis proteomes. We show that the fold change estimates of our robust approach are more precise and more accurate than those from state-of-the-art summarization-based methods and peptide-based regression models, which leads to an improved sensitivity and specificity. We also demonstrate that ionization competition effects come already into play at very low spike-in concentrations and confirm that analyses with peptide-based regression methods on peptide intensity values aggregated by charge state and modification status (e.g. MaxQuant’s peptides.txt file) are slightly superior to analyses on raw peptide intensity values (e.g. MaxQuant’s evidence.txt file).
Keywords
label-free quantification, bioinformatics, modeling, CPTAC, tandem mass spectrometry, empirical Bayes, ionization competition, ridge regression, peptide-based model, biostatistics, COMPLEX PROTEIN MIXTURES, MASS-SPECTROMETRY, NORMALIZATION, QUANTIFICATION, EXPRESSION, PERFORMANCE, MICROARRAY, ABUNDANCE, ARGININE, MS/MS

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 456.27 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Goeminne, Ludger, Kris Gevaert, and Lieven Clement. 2016. “Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.” Molecular & Cellular Proteomics 15 (2): 657–668.
APA
Goeminne, L., Gevaert, K., & Clement, L. (2016). Peptide-level robust ridge regression improves estimation, sensitivity, and specificity in data-dependent quantitative label-free shotgun proteomics. MOLECULAR & CELLULAR PROTEOMICS, 15(2), 657–668.
Vancouver
1.
Goeminne L, Gevaert K, Clement L. Peptide-level robust ridge regression improves estimation, sensitivity, and specificity in data-dependent quantitative label-free shotgun proteomics. MOLECULAR & CELLULAR PROTEOMICS. 2016;15(2):657–68.
MLA
Goeminne, Ludger, Kris Gevaert, and Lieven Clement. “Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics.” MOLECULAR & CELLULAR PROTEOMICS 15.2 (2016): 657–668. Print.
@article{7077757,
  abstract     = {Peptide intensities from mass spectra are increasingly used for relative quantitation of proteins in complex samples. However, numerous issues inherent to the mass spectrometry workflow turn quantitative proteomic data analysis into a crucial challenge. We and others have shown that modeling at the peptide level outperforms classical summarization-based approaches, which typically also discard a lot of proteins at the data preprocessing step. Peptide-based linear regression models, however, still suffer from unbalanced datasets due to missing peptide intensities, outlying peptide intensities and overfitting. Here, we further improve upon peptide-based models by three modular extensions: ridge regression, improved variance estimation by borrowing information across proteins with empirical Bayes and M-estimation with Huber weights. We illustrate our method on the CPTAC spike-in study and on a study comparing wild-type and ArgP knock-out Francisella tularensis proteomes. We show that the fold change estimates of our robust approach are more precise and more accurate than those from state-of-the-art summarization-based methods and peptide-based regression models, which leads to an improved sensitivity and specificity. We also demonstrate that ionization competition effects come already into play at very low spike-in concentrations and confirm that analyses with peptide-based regression methods on peptide intensity values aggregated by charge state and modification status (e.g. MaxQuant{\textquoteright}s peptides.txt file) are slightly superior to analyses on raw peptide intensity values (e.g. MaxQuant{\textquoteright}s evidence.txt file).},
  author       = {Goeminne, Ludger and Gevaert, Kris and Clement, Lieven},
  issn         = {1535-9476},
  journal      = {MOLECULAR \& CELLULAR PROTEOMICS},
  keyword      = {label-free quantification,bioinformatics,modeling,CPTAC,tandem mass spectrometry,empirical Bayes,ionization competition,ridge regression,peptide-based model,biostatistics,COMPLEX PROTEIN MIXTURES,MASS-SPECTROMETRY,NORMALIZATION,QUANTIFICATION,EXPRESSION,PERFORMANCE,MICROARRAY,ABUNDANCE,ARGININE,MS/MS},
  language     = {eng},
  number       = {2},
  pages        = {657--668},
  title        = {Peptide-level robust ridge regression improves estimation, sensitivity, and specificity in data-dependent quantitative label-free shotgun proteomics},
  url          = {http://dx.doi.org/10.1074/mcp.M115.055897},
  volume       = {15},
  year         = {2016},
}

Altmetric
View in Altmetric
Web of Science
Times cited: