Advanced search
1 file | 479.08 KB Add to list

A posteriori quality control for the curation and reuse of public proteomics data

(2011) PROTEOMICS. 11(11). p.2182-2194
Author
Organization
Project
Abstract
Proteomics is a rapidly expanding field encompassing a multitude of complex techniques and data types. To date much effort has been devoted to achieving the highest possible coverage of proteomes with the aim to inform future developments in basic biology as well as in clinical settings. As a result, growing amounts of data have been deposited in publicly available proteomics databases. These data are in turn increasingly reused for orthogonal downstream purposes such as data mining and machine learning. These downstream uses however, need ways to a posteriori validate whether a particular data set is suitable for the envisioned purpose. Furthermore, the (semi-) automatic curation of repository data is dependent on analyses that can highlight misannotation and edge conditions for data sets. Such curation is an important prerequisite for efficient proteomics data reuse in the life sciences in general. We therefore present here a selection of quality control metrics and approaches for the a posteriori detection of potential issues encountered in typical proteomics data sets. We illustrate our metrics by relying on publicly available data from the Proteomics Identifications Database ( PRIDE), and simultaneously show the usefulness of the large body of PRIDE data as a means to derive empirical background distributions for relevant metrics.
Keywords
TANDEM MASS-SPECTROMETRY, Quality control, LARGE-SCALE PROTEOMICS, LIQUID-CHROMATOGRAPHY, PEPTIDE IDENTIFICATION, PROTEIN MIXTURES, PLASMA-PROTEOME, QUANTIFICATION, QUANTITATION, REPOSITORY, STRATEGY, Quality assurance, PRIDE, Bioinformatics

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 479.08 KB

Citation

Please use this url to cite or link to this publication:

MLA
Foster, Joseph M., et al. “A Posteriori Quality Control for the Curation and Reuse of Public Proteomics Data.” PROTEOMICS, vol. 11, no. 11, 2011, pp. 2182–94, doi:10.1002/pmic.201000602.
APA
Foster, J. M., Degroeve, S., Gatto, L., Visser, M., Wang, R., Griss, J., … Martens, L. (2011). A posteriori quality control for the curation and reuse of public proteomics data. PROTEOMICS, 11(11), 2182–2194. https://doi.org/10.1002/pmic.201000602
Chicago author-date
Foster, Joseph M, Sven Degroeve, Laurent Gatto, Matthieu Visser, Rui Wang, Johannes Griss, Rolf Apweiler, and Lennart Martens. 2011. “A Posteriori Quality Control for the Curation and Reuse of Public Proteomics Data.” PROTEOMICS 11 (11): 2182–94. https://doi.org/10.1002/pmic.201000602.
Chicago author-date (all authors)
Foster, Joseph M, Sven Degroeve, Laurent Gatto, Matthieu Visser, Rui Wang, Johannes Griss, Rolf Apweiler, and Lennart Martens. 2011. “A Posteriori Quality Control for the Curation and Reuse of Public Proteomics Data.” PROTEOMICS 11 (11): 2182–2194. doi:10.1002/pmic.201000602.
Vancouver
1.
Foster JM, Degroeve S, Gatto L, Visser M, Wang R, Griss J, et al. A posteriori quality control for the curation and reuse of public proteomics data. PROTEOMICS. 2011;11(11):2182–94.
IEEE
[1]
J. M. Foster et al., “A posteriori quality control for the curation and reuse of public proteomics data,” PROTEOMICS, vol. 11, no. 11, pp. 2182–2194, 2011.
@article{1887681,
  abstract     = {{Proteomics is a rapidly expanding field encompassing a multitude of complex techniques and data types. To date much effort has been devoted to achieving the highest possible coverage of proteomes with the aim to inform future developments in basic biology as well as in clinical settings. As a result, growing amounts of data have been deposited in publicly available proteomics databases. These data are in turn increasingly reused for orthogonal downstream purposes such as data mining and machine learning. These downstream uses however, need ways to a posteriori validate whether a particular data set is suitable for the envisioned purpose. Furthermore, the (semi-) automatic curation of repository data is dependent on analyses that can highlight misannotation and edge conditions for data sets. Such curation is an important prerequisite for efficient proteomics data reuse in the life sciences in general. We therefore present here a selection of quality control metrics and approaches for the a posteriori detection of potential issues encountered in typical proteomics data sets. We illustrate our metrics by relying on publicly available data from the Proteomics Identifications Database ( PRIDE), and simultaneously show the usefulness of the large body of PRIDE data as a means to derive empirical background distributions for relevant metrics.}},
  author       = {{Foster, Joseph M and Degroeve, Sven and Gatto, Laurent and Visser, Matthieu and Wang, Rui and Griss, Johannes and Apweiler, Rolf and Martens, Lennart}},
  issn         = {{1615-9853}},
  journal      = {{PROTEOMICS}},
  keywords     = {{TANDEM MASS-SPECTROMETRY,Quality control,LARGE-SCALE PROTEOMICS,LIQUID-CHROMATOGRAPHY,PEPTIDE IDENTIFICATION,PROTEIN MIXTURES,PLASMA-PROTEOME,QUANTIFICATION,QUANTITATION,REPOSITORY,STRATEGY,Quality assurance,PRIDE,Bioinformatics}},
  language     = {{eng}},
  number       = {{11}},
  pages        = {{2182--2194}},
  title        = {{A posteriori quality control for the curation and reuse of public proteomics data}},
  url          = {{http://doi.org/10.1002/pmic.201000602}},
  volume       = {{11}},
  year         = {{2011}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: