Critical Assessment of Metaproteome Investigation (CAMPI): A Multi-Lab Comparison of Established Workflows

Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carried out the first community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluated the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observed that variability at the peptide level was predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappeared at the protein group level. While differences were observed for predicted community composition, similar functional profiles were obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.


Abstract
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear.
Here, we carried out the first community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI).
Based on well-established workflows, we evaluated the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample.
We observed that variability at the peptide level was predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappeared at the protein group level. While differences were observed for predicted community composition, similar functional profiles were obtained across workflows.
CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.

Main
Microbial communities play a primary role in global biogeochemical cycling and form complex interactions that are crucial for the development and maintenance of health in humans, animals, and plants. To fully understand microbial communities and their interplay with their environment requires knowledge not only of the microorganisms involved and their biodiversity, but also of their metabolic functions at both the cellular and community level 1 . As proteins constitute the key operational units performing these functions, metaproteomics has emerged as the most relevant approach to characterize the functional expression of a given microbiome 2,3 . Metaproteomics corresponds to the large-scale characterization of the entire set of proteins accumulated by all community members at a given point in time, known as the metaproteome 4 . Since its first introduction in 2004 5 , mass spectrometry (MS)-based metaproteomics has quickly emerged as a powerful tool to functionally characterize a broad variety of microbial communities in situ.
This allows a direct link to the phenotypes on a molecular level and shows the adaptation of the microorganisms to their specific environment 6 . Metaproteomics thus complements other meta-omic approaches such as metagenomics and metatranscriptomics, as these only have the exploratory power to assess the diversity and functional potential of microorganisms, but cannot observe their actual phenotypes 7 .
In metaproteomics, proteins are commonly measured using a bottom-up approach in which proteins are first extracted, isolated, and digested into peptides. These peptides are then separated and analyzed using liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). The resulting MS/MS spectra are typically matched against in silico generated spectra derived from a protein sequence database to identify the analyzed peptides and infer the original proteins. The inferred proteins are then used to describe the various active taxa in the community, their functions, and the relative gene expression levels 8 .
Each of the aforementioned steps can potentially influence the outcomes of a metaproteomic analysis and every step brings specific benefits as well as challenges. As a result, multiple workflows have been established. While such diversity brings flexibility, it also complicates the comparison of results across different experiments. Sample processing challenges include protein recovery due to the presence of different matrices 9 , the presence of different types of microorganisms with different optimal lysis conditions 10,11 , and limited depth of analysis 3 and quantification 12 due to an increased sample complexity. Environmental samples, such as feces or soil, are complex mixtures that can contain microbial cells, host cells, plant-derived fibrous materials, and other abiotic components. Therefore, composition and abundance of these components must be considered when choosing an appropriate method for cellular lysis and protein extraction. Fortunately, the most commonly used methods nowadays are relatively robust, and generally provide a reasonably representative extraction of proteins found in these complex mixtures. However, because differences exist, methods still need to be optimised for the specific samples and projects 13,14 Besides, apart from different sample processing protocols, different mass spectrometers might also lead to a variation in results.
Moreover, metaproteomics comes with many specific bioinformatic challenges 8,13 . First, the choice of an appropriate sequence database is critical for peptide identification 14,15 .
Typically, large databases can strongly impact sensitivity and false discovery rate (FDR) estimation 16 , while incomplete reference databases can lead to missing or false positive identifications 17,18 . Second, the protein inference problem 19 is more pronounced in metaproteomics due to many homologous proteins from closely related organisms 20 . As a result, several dedicated bioinformatic tools have been developed or extended for metaproteomic analysis [21][22][23][24][25][26][27][28] . Despite these challenges, the added value of metaproteomics has already been demonstrated in numerous examples from both the environmental and medical fields, providing unprecedented insights into the functional activity of microbial communities 7,20,29-41 .
Nevertheless, a lingering concern is the potential risk of unintended, approach-based biases inherent in various metaproteomic workflows. This is important because reproducibility is key to translate metaproteome studies into applications (e.g. clinical or industrial). Consequently, a comprehensive evaluation of widely-used workflows is required to assess their respective outcomes. In the past, various reference data sets from defined microbial community samples (i.e., for which the composition is known a priori) have been used in individual benchmarking studies [42][43][44] . However, a ring trial with different laboratories involved has not yet been performed in the field of metaproteomics.
To fill this gap, the 3rd International Metaproteomics Symposium (December 2018, Leipzig, Germany) hosted a multi-laboratory benchmarking study in the form of a community challenge. Participating laboratories received two microbial samples: a simplified mock community simulating the gut microbiome (SIHUMIx) and a complex, natural stool sample (fecal sample). Each group was allowed to use any preferred sample preparation, analysis, and data evaluation pipeline.
Here, we describe the results of this community-driven study, referred to as the Critical Assessment of MetaProteome Investigation (CAMPI). We compare and discuss the employed workflows covering all analysis steps from sample preparation to the bioinformatic identification and quantification. Moreover, we compare the metaproteome results with sequencing read-based analyses (metagenomics and metatranscriptomics).
We found that meta-omics databases performed better than public reference databases across both samples. More importantly, even though larger differences were observed in identified spectra and unique peptide sequences, the different protein grouping strategies and the functional annotations provided similar results across the provided data sets from all laboratories. When minor differences could be observed, these were largely due to differences in sample processing methods and partially to bioinformatic pipelines. Finally, for the taxonomic comparison, we found that overall profiles were similar between readbased methods and proteomics methods, with few exceptions.
Apart from these immediate conclusions, the CAMPI study also delivers highly valuable benchmark data sets that can serve as a foundation for future method development for metaproteomics.

Results
At the 3rd International Metaproteome Symposium in December 2018, individual lab outcomes of a collaborative, multi-laboratory effort to compare metaproteomic workflows were presented. In this study, metaproteomics data was acquired in seven laboratories, using a variety of well-established platforms. Figure 1 provides a general overview of the study design showing (i) the provision of two types of samples (SIHUMIx and fecal) to the study participants, (ii) the various experimental workflows of biomolecule extraction and MS/MS acquisition, and (iii) the bioinformatic processing steps from protein database generation to database search identification and follow-up analyses (more details in the Methods, see Supplementary Table 1 for an overview of all methods). The two samples (SIHUMIx and FECES) were, prior to the symposium, aliquoted and distributed over the participating laboratories. (ii) Pre-symposium work by participants (middle panels). Every used method by the participants, going from cell disruption to mass detection, is displayed. (iii) Post-symposium work by participants (right panel). The bioinformatics analyses, i.e. database creation and database search for peptide and protein identification, were harmonized to make the results between all participating laboratories comparable.
At the Symposium, the decision was made to re-analyse the acquired data with different bioinformatics pipelines, to obtain the first multi-laboratory effort in metaproteomics to independently evaluate available methodological and computational approaches, in line with similar community-driven benchmarking studies [45][46][47][48] . In the first Results section, we analyzed 42 raw files (21 for the SIHUMIx sample and 21 for the fecal sample) from 24 different workflow combinations with X!Tandem using either public or in-house generated protein databases (see Figure 1 for a general overview, and Figure 2 for the results; see online Methods for the database construction). A more in-depth comparison of sample preparations, bioinformatic pipelines, and taxonomic and functional annotations using a sub-selection of ten data sets is available after the first Results section.
Complex sample processing workflows and sample-specific meta-omic search databases lead to more identifications In order to study the effect of the different sample processing and LC-MS/MS workflows on the identification outcome, we searched all submitted MS files using the widely used X!Tandem search engine 49 . To investigate the influence of the chosen database, we searched each file against a publicly available reference database (SIHUMIx_REF and GUT_REF) and against a multi-omic database (SIHUMIx_MO and GUT_MO). The comparison of all CAMPI workflows is displayed in Figure 2 (raw data in Supplementary  On the left side, the bar charts show the number of identified spectra using the reference (REF) database (orange), the number of identified spectra using the multi-omic (MO) database (dark blue) and total amount of measured spectra (red). On the right side, the light blue bars represent the identification rate calculated as the percentage of spectra that yielded a peptide identification at 1% FDR for both the REF database (orange) and the MO database (dark blue). The specific protocols can be found in Supplementary Table 1. For database searching, X!Tandem was used as a single search engine.
The results greatly differed between the samples and workflows in terms of absolute numbers of acquired spectra, identified spectra, and relative amount of identified spectra (identification rates). For the SIHUMIx data set, the number of acquired spectra varied between 37k to 260k, and identification rates varied between 29.99% and 68.64% for SIHUMIx_REF and between 32.52% and 73.34% for SIHUMIx_MO. For the fecal data set, between 9k and 223k spectra were acquired, with identification rates between 11.99% and 34.79% for GUT_REF, and between 15.70% and 40.49% for GUT_MO.
The differences in acquired spectra show a clear relation to the method used, as similar methods or replicates show highly similar numbers of acquired spectra. As expected, more complex methods with longer gradient lengths (S03 and S04: 260 min, S05 and S06: 460 min, S08: 240 min, F01: 210 min, F02: 160 min), fractionation (S11, F07: 4 fractions), and additional separation methods such as MudPIT 50 (F01: 4 fractions) or ion mobility (PASEF) 51 (S13, F09) led to up to eight times more identified spectra, but at the cost of increased time and resources spent 52 (see Supplementary Table 1 for a detailed description, and Supplementary Table 2 for an overview of the samples). Notably, identification rates were not necessarily correlated with the total number of identifications.
For example, between analyses S03 and S05, which used a 260 min and 460 min LC gradient length, respectively, a higher absolute number of identified spectra was found for the 460 min gradient, but also a lower identification rate. As expected, if an MS instrument is provided with the ability to acquire more spectra, it will do so. However, the gains in spectral acquisition do not readily translate into gains in identification. There is thus a potential for diminishing returns when going for more complex methods. There is also a somewhat consistent drop in the number of acquired spectra of around 10% when comparing SIHUMIx samples with fecal samples for similar workflows (e.g., S09-S10 with F05-F06, and S13 Reps 1-3 with F09 Reps 1-3). However, occasionally this drop is much greater, as for S11_Fract1-4 and F07_Fract1-4. The overall limited drop might be attributed to the higher complexity of the fecal sample, and corresponding ion suppression effects. The differences in identification rate are likely to be derived from the choice of the search database. The identification rates for the publicly available databases were invariably lower, which is due to their larger and less specific search space, consistent with literature 14,16,18,42,53 . Here, these public reference databases (SIHUMIx_REF and GUT_REF) contained 1.6 and 16 times, respectively, more unique in silico digested peptides than the corresponding multi-omic databases (SIHUMIx_MO and GUT_MO) (Supplementary File 1).
Overall, our results indicate that generating a sample-specific meta-omic database can be advantageous for complex metaproteomics samples, such as the human gut microbiome, and even more so for complex and poorly characterised samples such as soil microbiota. The smaller meta-omic databases require less computational resources (e.g., CPU and RAM) and tend to be more accurate due to their tailored composition.
However, for their generation, meta-omic databases require additional experimental and computational resources, and are often not as well assembled and/or annotated as reference databases. Because the composition of SIHUMIx was known, the benefit of using a tailored meta-omic database was limited and the analysis was feasible with available reference proteomes. In contrast, the community for the fecal sample was unknown, which represents the typical scenario in metaproteomics.
For known reference samples (such as SIHUMIx), it is therefore reasonable to simply use the reference database, while the largely unknown fecal sample community is best analysed using a tailored meta-omic database. In the following sections, we thus opted to use only the SIHUMIx_REF and GUT_MO search databases for SIHUMIx and fecal data sets, respectively.

Different bioinformatic pipelines resulted in highly similar peptide identifications
To investigate the effect of the bioinformatic pipelines on peptide identification, we compared the two data sets with the most identified peptides (S11 and F07) (Figure 3).
To ensure a robust and reliable comparison, we fixed the search parameters for the four different bioinformatic pipelines employed (see online Methods for details). For SIHUMIx, the majority of the identified peptides (54.2%) were found by all four bioinformatic pipelines ( Figure 3A), while this ratio dropped to 40% for the more complex fecal F07 sample ( Figure 3B). As expected, this percentage increased to 73% and 55%, respectively, when considering the peptides identified by at least three out of four tools.
Interestingly, 16% of the peptides were uniquely identified by a single bioinformatic pipeline for the S11 data set (3138, 2670, 891, and 841 peptides for Furthermore, each algorithm uses its own score as a quality metric for finding the best matching peptide for a spectrum. This score varies between the search engines and can even result in different peptide identifications for the same spectrum 54 . Overall, the combination from multiple search engines as performed by SearchGUI/PeptideShaker (four algorithms) resulted in the highest number of identifications, which is in line with the previous studies in proteomics and proteogenomics 55,56 . This effect may be attributable to algorithms with more sophisticated scoring methods (e.g., MS-GF+ 57 used in SearchGUI, but not in MPA), which generally lead to more identifications overall. However, we do expect that novel search engines based on machine learning algorithms can still boost the number of peptide identifications in the field of metaproteomics 58 .
Additionally, we compared the pipelines in terms of peptide features using the peptide lengths and the number of missed cleavages (lower panels of Figure 3A and 3B). While few outliers could be observed (e.g. peptide length over 50 AA for MaxQuant and missed cleavages over two for SearchGui/PeptideShaker and ProteomeDiscoverer), the features were overall equally distributed between pipelines. Most of the differences thus seemed to be simply linked to the search engines used.
Because the SearchGUI/PeptideShaker combination provided the most identifications, relatively few identifications were missed by excluding the other three pipelines. We therefore preferred to only use the results of the SearchGUI/PeptideShaker pipeline in the following sections, which investigate the effect of different sample processing workflows on downstream peptide identifications. These analyses are performed on ten representative data sets that have been selected based on their type of fractionation and MS instrument. These include six SIHUMIx, and four fecal data sets (Supplementary Table 2).

Differences between laboratory workflows are mostly attributable to low abundance proteins
After we ruled out bioinformatic workflows as a source of significant difference between samples, we investigated differences arising from different laboratory workflows. We compared the overlap and uniqueness of identifications at the level of peptides, protein subgroups, and the 50% most abundant protein subgroups for the selected laboratory workflows in Figure 4. The figure shows how many peptides and protein subgroups are uniquely identified by a single laboratory workflow and how many are identified by all laboratory workflows. At the peptide level ( Figure 4A and B), more complex workflows, such as those with longer gradient length and fractionation, identified the most peptides in general (as shown earlier in Figure 2) as well as the most workflow-specific peptides, thus limiting the potential for overlap. The number of identified peptides shared between all workflows was quite limited: only 3,557 peptides (4.9% of all identified peptides) in the SIHUMIx data sets, and 2,186 peptides (3.4% of all identified peptides) in the fecal data set. At the protein subgroup level (Figure 4C and D), the intersections of protein subgroups shared across all workflows were 25.7% and 34.6% for the SIHUMIx and fecal data sets, respectively. These percentages increased to 51.5% and 67.4% when we only considered the 50% most abundant protein subgroups (Figure 4E and F). Large differences between laboratory workflows observed at the peptide level were thus attenuated at the protein subgroup level, and further reduced for the 50% most abundant protein subgroups. This trend was also clearly visible when considering all intersections, including partial agreement among some samples (Supplementary Figures 1 and 2). Of note is that the data sets that only differed in a single laboratory method parameter, such as LC gradient length (S03 and S05) or fractionation (F06 and F07), showed a much higher overlap. Also, the number of protein subgroups identified uniquely in a single sample mostly disappeared when only considering the 50% most abundant subgroups. Furthermore, subgroups that were identified with a single peptide -and therefore usually at the lowest abundance -track very closely with the subgroups identified in only a single sample. Finally, when considering the actual spectral abundance of subgroups, those subgroups that were found in all samples also explained at least 77% of the identified spectra. It is therefore clear that the low agreement between samples at the peptide level is mostly attributable to the identification of low abundant proteins. The complexity of the samples and the limited speed of mass spectrometers in DDA mode led to stochasticity in precursor selection at the low end of the dynamic range. Low abundant protein subgroups with only one peptide thus behave more like peptides, where stochastic selection causes large differences between samples. It is worth noting that this issue is completely avoided by only selecting the Top 50% of protein subgroups. Overall, it can be concluded that while different laboratory workflows provide very different peptide identifications, the protein subgroups are well preserved.
Because protein grouping plays such an important role in translating peptide identifications into biologically meaningful information, we decided to analyze two commonly used grouping methods in more detail. Protein grouping is achieved using the algorithms PAPPSO 59 Table 4). While cross sample correlation confirmed that the impact of bioinformatic pipelines on the analysis here was negligible, little else could be learned from this correlation analysis (Supplementary Figure 4 and 5). To shed some light on these differences between protein grouping methods, we analyzed the agreement between samples for different grouping approaches (Supplementary Figure   6 and 7). Notably, when applied to the fecal sample, both grouping algorithms resulted in an unusually high number of protein groups that are unique to F10. However, it remains unclear which of these approaches is better able to capture the actual composition of the sample, or even if the performance of the approaches varies for different types of samples. Because PAPPSO grouping removes likely wrong identifications from homologues, it could be more appropriate for single-organism proteomics or for taxonomically well-defined samples like SIHUMIx. In contrast, the grouping from MPA could be more appropriate for complex, unknown samples like the fecal sample (where shared peptides become much more likely) as it retains all information for the grouping (Supplementary note 1.3). To conclude, both protein grouping methods provide highly similar results for the SIHUMIx sample, but diverge on the fecal sample, likely due to the increased complexity of the protein inference task in the latter.

Comparison of meta-omic methods reveals differences between peptide and protein-derived analysis of taxonomic community composition
To determine if differences between sample processing workflows have an effect on the overall biological conclusions, we quantitatively compared the identified taxa for each selected sample from both data sets using spectral counts, and this at the peptide, the protein subgroup, and the sequencing read level.
We found different trends between the SIHUMIx and fecal samples (Figures 5 and 6).
For SIHUMIx, the taxonomic distributions were relatively similar between the metagenomic read, peptide, and protein group levels based on the principal component analysis. Hierarchical clustering highlighted clusters of samples, with the peptide and protein subgroup profiles for samples S07 and S14 clustering with the read-based profile ( Figure 5A) (Supplementary Figure 8A and B). Interestingly, samples with more complex wet-lab methods (S03, S05 and S08) did not show clustering between the peptide and the protein subgroups level. While species were found to be similar between methods overall, there were some notable differences ( Figure 5B). All methods agreed that Bacteroides thetaiotaomicron was the most abundant species, and found Escherichia coli at 10-13% abundance. However, differences were found for Blautia producta, which was barely found by the proteomics methods, while found at around 5% abundance by metagenomics. It is interesting to consider that this might be caused by the construction of the reference database: at the moment of construction, the UniprotKB reference proteome of Blautia producta was not available, and multiple Blautia sp. proteomes were therefore provided instead. When looking at the Unipept results in detail, 15% of the peptides were associated with the genus Blautia (Supplementary Table 5), which indicates that the lower identification of Blautia producta at the peptide level is due to difficulties in resolving Blautia at the species level, rather than a lack of identified Blautia peptides during the metaproteomic search.
Additionally, Clostridium butyricum was not found by the read-based method, while Clostridiales bacterium and Bacteroides dorei were falsely found by the protein-centric method as these are not present in the SIHUMIx sample. However, these last two were both found at very low abundance. For completeness, the comparisons of community composition for SIHUMIx at the genus level were added in Supplementary Figure 9. For the fecal data set, which was grouped at the family level, relatively distinct assessments of community composition were obtained from the read-based, peptide, and protein subgroup levels ( Figure 6A). While the same families were identified, these had different proportions across methods (Figure 6B). Metatranscriptomic information (Feces_MT) was available for the fecal sample and RNA and DNA results were closely colocated, while proteins and peptides were spread out from the read-based methods, but also from each other ( Figure  6A). The difference between metagenomics/metatranscriptomics and metaproteomics is not surprising because these different methods highlight community profiles from different angles. As already shown before, metagenomics provides a good assessment of community composition in terms of cell numbers for each species, while metaproteomics reflects proteinaceous biomass for each species 43 .
Strikingly, for the fecal samples, the community composition as quantified at the peptide level proved to be more similar to the read-based than to the protein-based composition ( Figure 6A) (Supplementary Figure 9A and B). This discrepancy is likely due to the fundamental issue of protein inference. Indeed, in metaproteomics, identification and quantification usually rely on discriminative peptides. As the data sets get more complex, higher levels of sequence homology for many proteins will be observed and will lead to a much greater level of peptide degeneracy across taxonomies 61 . Direct taxon inference from peptides thus likely results in more stringent taxonomy filtering, due to the necessity to rely only on taxon-specific peptides. In fact, the proportion of unclassified peptides between the SIHUMIx and the fecal samples went up from 24.2% to 73.4% due to the increased taxonomic complexity of the fecal data set. In contrast, the proportion of unclassified protein subgroups went down from 69.9% for SIHUMIx to 9.5% for the fecal samples. This latter difference, while large, is not that surprising because the fecal sample considered protein subgroups at the family level, while the SIHUMIx sample considered protein subgroups at the species level, and only considered SIHUMIx species, therefore greatly limiting peptide-level degeneracy. For the fecal sample, proteins within a subgroup are usually associated to the same family, which explains the higher proportion of protein subgroups that can be classified for the fecal samples. Additionally, regarding quantification, protein grouping for the fecal samples was done using MPA, which includes all peptides (shared as well as unique), while peptide level quantification only took into account taxon-specific peptides. Depending on the sample and the method used, the taxonomic resolution will thus vary. To better illustrate that, we compared the resolution across omes and across protein grouping methods (Supplementary Figures   11A and B). We see that there is usually a drop of resolution either at the species (SIHUMIx) or the genus (Fecal) level and that the PAPPSO grouping method has a higher resolution for complex samples as already discussed in Supplementary note 1.3.
Altogether, the degree of degeneracy at the peptide level combined with the grouping method employed for the proteins leads to a different amount of features used for each analysis and thus to different composition profiles between peptide-centric and proteincentric approaches.
Ultimately, due to the sequence homology issue, worse taxonomic resolution will be available for larger, more complex data sets as illustrated in the differences between the SIHUMIx and the fecal data sets. A promising approach to tackle these limitations can take advantage of shared rather than taxon-specific peptides (and thus avoiding the previously mentioned issues) to assess the biomass content of a given community 61 However, regardless of the chosen approach, it is clear that a higher level of peptide coverage will be quite helpful for higher resolution taxonomic annotation, and that metaproteomics will therefore benefit from focusing on analysis depth at the peptide level.

The functional profile is similar between different metaproteomics workflows
A major strength of metaproteomics is the ability to provide functional information that reflects the phenotype of the analyzed sample. In order to investigate the influence of post-processing steps on this functional information, we compared functional community profiles on both the SIHUMIx and the fecal samples (Figure 7). We observed that the functional similarity between data sets acquired with different workflows on each sample is extremely high, and this regardless of the approach chosen. For the peptide-centric approach, we compared the Gene Ontology (GO) terms (GO domain "biological process") In contrast, comparison between the different omics domains showed important differences in terms of functional profile. Notably, metagenomics and metaproteomics are particularly different from each other, while metatranscriptomics tends to overlap better with metagenomics, highlighting once more the need for integrated meta-omics approaches (Supplementary Figures 13, 14 and 15) 30 .

Discussion
In this founding edition of CAMPI, we used both a simplified, laboratory-assembled sample as well as a human fecal sample to compare commonly used experimental methods and computational pipelines in metaproteomics at the peptide, protein subgroup, taxonomic and functional level, informed by and contrasted with metagenomics and metatranscriptomics. Our findings demonstrate some differences in the taxonomic profiles between peptide-centric metaproteomics, protein-centric metaproteomics, and read-based metagenomics and metatranscriptomics. This fits well with previous findings that assessment of microbial community structure via shotgun metagenomics and metaproteomics differs in the information obtained. While metagenomics has been shown to provide a good representation of per species cell numbers in a community, metaproteomics has been shown to provide a good representation of per species biomass in a community 43 . When looking at different proteomics approaches, differences tend to show up primarily at the finest resolution, such as the sequences of the identified peptide sequences. When considering information from the protein subgroup level up, much of this variation disappears. Different protocols tend to primarily display different levels of analytic depth, which correlates with more extensive sample fractionation and faster instruments. Moreover, differences between search engines appear somewhat complementary, giving an advantage to integrative, multi-search engine approaches using more sophisticated scoring engines. Interestingly, there appears to be an important contribution to any observed differences from the sequence database used for identification. This is particularly evident in the protein inference step, where peptide-level degeneracy in the database becomes an important factor in the outcome of protein grouping, as already shown and discussed previously 63,64 . Overall, functional profiles of different proteomics workflows were quite similar, which is a reassuring characteristic due to the unique perspective provided by proteomics on the functional level.
Besides the direct conclusions of CAMPI as summarized here, another important outcome of this study is the availability of the acquired data sets. Indeed, these can serve as benchmark data sets for the field when developing novel algorithms and approaches for data processing and interpretation (see Data Availability).
Moreover, this first CAMPI study has highlighted that there is room for future editions of Obviously, relevant standardized samples will need to be defined for these studies, and should moreover be produced in sufficient amounts to allow their continued use by interested researchers after publication of these studies. These could take the form of a defined synthetic community with exactly known composition, including cell numbers and sizes, preferably stimulated under different biological conditions. With such a sample, we will be able to validate a variety of quantification methods, but also investigate the effect of quantifying individual proteins in relation to their background. Moreover, it remains a question for now what the effect will be on the taxonomic resolution or functional profile.
Label-based approaches could also be extremely valuable for the field as it has been shown that stable isotope labelling as a spike-in reference can strongly improve quantification accuracy 65,66 . On another technical level, we could investigate the opportunities and challenges of the use of DIA on metaproteomics samples. Potentially, there will be new, AI-driven search engines that will enter the field of (meta)proteomics, which also brings new opportunities for the field.
Of course, all these follow-up CAMPI studies will contribute highly useful benchmark samples and data sets to the field as well, thus creating a strong, positive feedback loop with the metaproteomics community. Future CAMPI editions will be launched by the Metaproteomics Initiative (metaproteomics.org), a newly founded community of metaproteomics researchers which aims, among other things, to standardize and accelerate experimental and bioinformatic methodologies in this field. This initiative can combine forces with existing initiatives such as the ABRF iPRG study group, who recently provided a metaproteomics data set to be analysed by the proteomics informatics community 65 . We believe that such ongoing efforts will continue to advance the field of metaproteomics, and make it more widely applicable. Metaproteomics will thus develop its full potential, and further increase its relevance across the life sciences.

Methods
The CAMPI study aims to evaluate the impact of different protein extraction protocols, MS/MS acquisition strategies, and bioinformatic pipelines used in metaproteomics (see Figure 1 for a general overview, and Supplementary Table 1 for an overview of all methods).  66 . SIHUMIx was prepared as previously described, with an additional 24h of cultivation of one control bioreactor, to produce sufficient biomass to be sent out to each participating laboratory 66 .

Human fecal microbiome sample
A natural human fecal microbiome sample was procured upon informed consent from a 33-year old omnivorous, non-smoking woman, with approval by the ethics committee of the University Magdeburg (number 99/10). The sample was immediately homogenized, treated with RNA-later, aliquoted, frozen, and stored at -20°C until aliquots were sent to each participating laboratory.

Protein extraction and processing
In total, eight different protein extraction protocols were applied and resulted in 24 different workflows when combined with MS/MS acquisition strategies (Figure 1). Key characteristics for each workflow can be found in the Supplementary Table 1. The most obvious workflow differences were found in protein recovery, cleaning, and fractionation strategies. In a wide comparative approach, the protein extract was processed by either filter-aided sample preparation (FASP) 73 (workflows 1-3, 5, 7-9,11-12,19-23 in Supplementary Table 1 hours. Finally, peptides were recovered from the gel or eluted from filters (FASP) using a salt solution (workflows 1-3, 5-21, 24). In some protocols, peptides were desalted using different commercial devices (workflows 4, 21, and 24).

LC-MS/MS acquisition
Each laboratory used its own LC-MS/MS protocol with the largest differences and similarities highlighted in the following and details provided in Supplementary Table 1.
For LC, all laboratories separated peptides using reversed-phase chromatography with a linear gradient length ranging from 60 min to 460 min. Furthermore, one group performed an additional separation using a multidimensional protein identification technology (MudPIT) combining cation exchange and reversed-phase separation in a single column prepared in-house 74  The four databases were in silico digested into tryptic peptides with an in-house developed script, with two missed cleavages allowed, to compare their theoretical search spaces. Additionally, all peptides identified with each database in the explorative analysis, which was carried out using all data sets, were retrieved and compared.
For metaproteomic data analysis, the number of spectra, PSMs, and identification rates (calculated by dividing the number of identified spectra by the total number of acquired MS/MS spectra) were extracted for all data sets searched against the selected databases (SIHUMIx_REF and GUT_MO) and compared. Finally, a representative subset of data sets, based on the different methods, was selected for further analysis (S03, S05, S07, S08, S11, S14 for SIHUMIx and F01, F06, F07, and F08 for the fecal sample).

Data analysis using four different bioinformatic pipelines
All submitted MS/MS raw files were first analyzed with a single commonly used database Peptides were filtered on length (between 6 and 50 amino acids), and charge state (+2, +3, and +4), and a maximum valid expectation value (e-value) of 0.1 78 .
The following database search engines were used for the pipeline comparison: (i) Thermo Fisher). The identification settings for all search engines were the same as for the explorative analysis mentioned above. Refinement searches were allowed if implemented in the search engine (e.g., refinement search of X!Tandem), and the same for the inclusion of post-processing tools (e.g., Percolator within ProteomeDiscoverer).

Protein inference
To allow protein group comparison, groups were created using the combined peptide evidence of all compared samples. Two different protein grouping methods were tested: MPA 26 and PAPPSO 59 , and analyses were made on protein groups and subgroups ( Supplementary Note 1.3).
Assigning peptides to their correct protein can be a difficult task, notably due to the protein inference issue 3 , i.e., the same peptide can be found in different homologous proteins.
This is particularly challenging in metaproteomics where the diversity and number of homologous proteins are much higher compared to single-species proteomics. To overcome this issue, most bioinformatic pipelines tend to automatically group homologous protein sequences into protein groups. However, each tool handles protein inference and protein groups in its own way, which prevents a straightforward output comparison at the protein group level. In order to allow robust comparison between approaches, the PSM output files of the four bioinformatic pipelines were combined. The peptides were then assigned to protein sequences in the FASTA file and the data was prepared for subsequent protein grouping. Two approaches of protein grouping were used and evaluated in this study: PAPPSO grouping 59 , which excludes proteins based on the rule of maximum parsimony, and grouping from MPA 26 , which does not exclude proteins. All data processing was done using a custom Java program except for PAPPSO grouping for which data was exported and imported using the appropriate XML format.
For both methods, protein groups were created using the loose rule "share at least one peptide" (groups) and the strict rule "share a common set of peptides" (subgroups), resulting in a total of four protein grouping analyses: (1) PAPPSO groups, (2) MPA groups, (3) PAPPSO subgroups, and (4) MPA subgroups. Finally, the resulting protein groups and subgroups were exported for further analysis (Supplementary Note 1.3).
These algorithms are also implemented in Pout2Prot 91 for independent use.

Taxonomic and functional annotation
Annotations were performed at both the peptide, protein and the sequencing read level.
Unipept was used for the peptide-centric approach 22,25,87 Note 1.4). For the taxonomic annotation of the fecal data sets with Unipept, the desktop 87 and CLI 21,88 versions were used. In both analyses for SIHUMIx and the fecal data sets, isoleucine (I) and leucine (L) were equated.
The assigned taxonomies for each of the peptides can be found in Supplementary Files

and 4.
For the functional analysis at the peptide level, we used the Unipept command line option to extract the GO terms for each identified peptide per data set (below 1% FDR). The functional similarity of these sets of GO terms was calculated with MegaGO 62 .
Prophane was used for the protein-centric approach 89 SIHUMIx and the feces sample, respectively.
Quantification was based on read counts for metagenomic and metatranscriptomics data, and on spectral counts for peptides and protein subgroups. If two subgroups contained the same peptide, spectra would be counted twice, distorting the abundance of these particular subgroups inside a measurement, but preserving a consistent count for comparison with other samples. Comparisons were performed with normalised values as described in detail below.

Taxonomic resolution
Taxonomic annotations from the Prophane protein group outputs were used for metaproteomics. This method uses only identified proteins and assesses annotations based on the LCA approach thus generating results for each protein at the best possible taxonomic resolution The mOTU2 profiler used for the metagenomic taxonomic annotation takes advantage of marker genes for taxonomic annotation and thus annotates everything at the OTU level. Since this approach does not allow comparison at each taxonomic level, Kraken2 103 was used to compare taxonomic resolution across omics domains. Kraken2 was run on the sequencing reads with the maxikraken2_1903 database and a confidence threshold set to 0.7.

Functional comparison
Each sequence database (SIHUMIx_REF, SIHUMIx_MO and GUT_MO) was annotated with the Mantis 104 tool for consensus-driven protein annotation. For metaproteomics, abundance from prophane outputs and annotation from Mantis were used to generate functional profiles. For metagenomics and metatranscriptomics, sequencing reads were mapped against the assembly contigs using bowtie2 105 and ORFs abundance was calculated using featureCounts 106 KEGG 107 annotations were retrieved from Mantis and used to compare functional profiles across omes.

Statistical analyses
Differences and overlap between search engines at the peptide level and between approaches at the peptide level using presence/absence data were visualized with UpSet plots with the UpSetR package 98 . For the peptides, sequences were extracted (without modifications and with leucine (L) and isoleucine (I) treated equally and replaced by J) from each result file and a table, indicating whether a peptide was found or not, was prepared (Supplementary Note 1.4 and Supplementary Files 6 and 7). Similar tables and UpSet plots were generated to visualize differences and overlap between sample preparations for the peptides, the protein subgroups and the top 50% protein subgroups.
The top 50% were first selected based on abundance data. The spectral counts were summed for each subgroup across all selected samples and only the top 50% was kept for UpSet plot comparison. Results from the taxonomic annotations for all approaches (peptides, proteins, metagenomic and metatranscriptomic reads) were compared and visualized using the PCA comparison feature of the R prcomp package. For the comparison, abundance values (number of reads and spectral counts) were used and normalized into percentage. The taxonomic annotations were harmonized across methods, unclassified values were filtered out and annotations with abundance lower than 0.05% after filtering were grouped into "other".
All correlation plots were calculated using both Pearson and Spearman correlations with a p-value < 0.001. The correlations were calculated and plotted using the corrplot R packages.
Hierarchical clusterings were calculated with the R function hclust using the Manhattan distance and the Ward method.

Data availability
The metaproteomic data sets generated and analyzed in the current study are available via the PRIDE partner repository with the data set identifier PXD023217 (Username: reviewer_pxd023217@ebi.ac.uk Password: XXX). Assemblies and raw metagenomic and metatranscriptomic reads are available through the European Nucleotide Archive under the study accession number PRJEB42466.

Code availability
All scripts are made available on github.com/metaproteomics/CAMPI.