Advanced search

Exploring 3D pooled next-generation sequencing using SNP-Cub^3

Joachim De Schrijver (UGent) , Bram De Wilde (UGent) , Geert Trooskens (UGent) , Jo Vandesompele (UGent) and Wim Van Criekinge (UGent)
Author
Organization
Abstract
Next-generation amplicon resequencing allows rapid identification of novel or known SNPs and mutations. However, sample preparation remains a largely manual and sometimes tedious task, especially when large pools of samples need to be analyzed. Multidimensional pooling is a technique where a sample is duplicated in a combination of pools in such a way that every sample is included in a unique combination of pools. This pooling technique allows a drastic reduction of sample pools (from x^y samples to x*y pools, for example 125 samples to 15 pools) but complicates downstream data analysis. We developed a simulation and analysis pipeline - called SNP-Cub^3 - for 3D pooled illumina resequencing that allows optimization and scaling of the multidimensional pools, optimization of the sequencing setup and analysis of the sequencing data. A drawback of this pooling approach is that variant frequencies in a single pool are very low. For example: in a multidimensional pool where 25 samples are included into 1 pool, a homozygous variant or SNP in a single sample will appear as a 4% variant in the total pool. Although this frequency is higher than the sequencing error rate or PCR error rate, stochastic effects can make it difficult to classify this as a variant or a sequencing/PCR error. The pipeline is able to overcome this limitation by comparing variant frequencies in several pools wherein a single sample is included. A real sample variant (unique to that sample) in a 3D-pooled setup should appear in 3 pools with the same frequency. SNP-cub^3 is able to assign a p-value, which is the probability that a real variant would be randomly sampled at the observed frequencies in the different pools, to an observed variant. The simulation module of the pipeline allows a user to experimentally validate the limits of this approach by altering the number of samples and pools in combination with a predefined PCR and sequencing error rate which is used to generate simulated reads.

Citation

Please use this url to cite or link to this publication:

Chicago
De Schrijver, Joachim, Bram De Wilde, Geert Trooskens, Jo Vandesompele, and Wim Van Criekinge. 2010. “Exploring 3D Pooled Next-generation Sequencing Using SNP-Cub^3.” In Computational Biology, 9th European Conference, Abstracts.
APA
De Schrijver, Joachim, De Wilde, B., Trooskens, G., Vandesompele, J., & Van Criekinge, W. (2010). Exploring 3D pooled next-generation sequencing using SNP-Cub^3. Computational Biology, 9th European conference, Abstracts. Presented at the 9th European conference on Computational Biology (ECCB10).
Vancouver
1.
De Schrijver J, De Wilde B, Trooskens G, Vandesompele J, Van Criekinge W. Exploring 3D pooled next-generation sequencing using SNP-Cub^3. Computational Biology, 9th European conference, Abstracts. 2010.
MLA
De Schrijver, Joachim, Bram De Wilde, Geert Trooskens, et al. “Exploring 3D Pooled Next-generation Sequencing Using SNP-Cub^3.” Computational Biology, 9th European Conference, Abstracts. 2010. Print.
@inproceedings{1042272,
  abstract     = {Next-generation amplicon resequencing allows rapid identification of novel or known SNPs and mutations. However, sample preparation remains a largely manual and sometimes tedious task, especially when large pools of samples need to be analyzed.
Multidimensional pooling is a technique where a sample is duplicated in a combination of pools in such a way that every sample is included in a unique combination of pools. This pooling technique allows a drastic reduction of sample pools (from x\^{ }y samples to x*y pools, for example 125 samples to 15 pools) but complicates downstream data analysis.
We developed a simulation and analysis pipeline - called SNP-Cub\^{ }3 - for 3D pooled illumina resequencing that allows optimization and scaling of the multidimensional pools, optimization of the sequencing setup and analysis of the sequencing data. 
A drawback of this pooling approach is that variant frequencies in a single pool are very low. For example: in a multidimensional pool where 25 samples are included into 1 pool, a homozygous variant or SNP in a single sample will appear as a 4\% variant in the total pool. Although this frequency is higher than the sequencing error rate or PCR error rate, stochastic effects can make it difficult to classify this as a variant or a sequencing/PCR error. The pipeline is able to overcome this limitation by comparing variant frequencies in several pools wherein a single sample is included. A real sample variant (unique to that sample) in a 3D-pooled setup should appear in 3 pools with the same frequency. SNP-cub\^{ }3 is able to assign a p-value, which is the probability that a real variant would be randomly sampled at the observed frequencies in the different pools, to an observed variant.
The simulation module of the pipeline allows a user to experimentally validate the limits of this approach by altering the number of samples and pools in combination with a predefined PCR and sequencing error rate which is used to generate simulated reads.},
  author       = {De Schrijver, Joachim and De Wilde, Bram and Trooskens, Geert and Vandesompele, Jo and Van Criekinge, Wim},
  booktitle    = {Computational Biology, 9th European conference, Abstracts},
  language     = {eng},
  location     = {Ghent, Belgium},
  title        = {Exploring 3D pooled next-generation sequencing using SNP-Cub\^{ }3},
  year         = {2010},
}