Advanced search
1 file | 4.41 MB Add to list

Development of a sequence analysis pipeline for variant detection and interpretation

Tom Sante (UGent)
(2015)
Author
Promoter
(UGent)
Organization
Abstract
Developing a sequencing analysis pipeline for variant discovery and interpretation and the necessary supporting infrastructure is a great challenge. The objectives presented in Chapter 2 were essential for handling the growing volume and velocity at which sequencing data is being generated these last few years. There is a great demand for better platforms to drive our analyses, and as introduced in section 2.1 and discussed at the end of the thesis, a solid but elastic infrastructure is essential to support such platform. The early efforts focused on moving away from custom-built physical application servers to a more flexible private cloud platform that can offer compute power as a service to researchers, eliminating the need to handle lower level system management duties. At the same time we implemented a distributed network filesystem to handle the volume and performance expected for genome analysis, and keep the data replicated to protect against loss. Building on that infrastructure, the following work focused on streamlining an analysis pipeline for structural variation. The introduction of new technologies such as mate-pair/paired- end sequencing and coverage based analysis, spurred many different tools but no clear way to tackle the complete process, starting from reads until we can produce a genetic report. The web based platform, called ViVar, handles intelligent processing of the data, provides careful filtering and a diverse set of visualizations, to distill sequence reads to a set of clinically rel- evant structural variants. Additionally, a SNP analysis focused platform was build. It runs a pipeline conforming to the GATK best practice guide and helps users process the results aided by extensive filtering options to interpret and select the most relevant variants. As should be clear after reading the discussion, several focus areas remain to lift the per- formance and flexibility of the analysis tools and the researchers to the next level. The most important aspects are: building scalable, cost-effective dependable storage systems; optimizing cycle times with better infrastructure and efficient tools improving developer and user produc- tivity; and respecting information security when working with sensitive data. These will be essential drivers of research innovation and novel clinical applications based on sequencing data.

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 4.41 MB

Citation

Please use this url to cite or link to this publication:

MLA
Sante, Tom. Development of a Sequence Analysis Pipeline for Variant Detection and Interpretation. Ghent University. Faculty of Medicine and Health Sciences, 2015.
APA
Sante, T. (2015). Development of a sequence analysis pipeline for variant detection and interpretation. Ghent University. Faculty of Medicine and Health Sciences, Ghent, Belgium.
Chicago author-date
Sante, Tom. 2015. “Development of a Sequence Analysis Pipeline for Variant Detection and Interpretation.” Ghent, Belgium: Ghent University. Faculty of Medicine and Health Sciences.
Chicago author-date (all authors)
Sante, Tom. 2015. “Development of a Sequence Analysis Pipeline for Variant Detection and Interpretation.” Ghent, Belgium: Ghent University. Faculty of Medicine and Health Sciences.
Vancouver
1.
Sante T. Development of a sequence analysis pipeline for variant detection and interpretation. [Ghent, Belgium]: Ghent University. Faculty of Medicine and Health Sciences; 2015.
IEEE
[1]
T. Sante, “Development of a sequence analysis pipeline for variant detection and interpretation,” Ghent University. Faculty of Medicine and Health Sciences, Ghent, Belgium, 2015.
@phdthesis{7139868,
  abstract     = {{Developing a sequencing analysis pipeline for variant discovery and interpretation and the necessary supporting infrastructure is a great challenge. The objectives presented in Chapter 2 were essential for handling the growing volume and velocity at which sequencing data is being generated these last few years. There is a great demand for better platforms to drive our analyses, and as introduced in section 2.1 and discussed at the end of the thesis, a solid but elastic infrastructure is essential to support such platform.
The early efforts focused on moving away from custom-built physical application servers to a more flexible private cloud platform that can offer compute power as a service to researchers, eliminating the need to handle lower level system management duties. At the same time we implemented a distributed network filesystem to handle the volume and performance expected for genome analysis, and keep the data replicated to protect against loss.
Building on that infrastructure, the following work focused on streamlining an analysis pipeline for structural variation. The introduction of new technologies such as mate-pair/paired- end sequencing and coverage based analysis, spurred many different tools but no clear way to tackle the complete process, starting from reads until we can produce a genetic report. The web based platform, called ViVar, handles intelligent processing of the data, provides careful filtering and a diverse set of visualizations, to distill sequence reads to a set of clinically rel- evant structural variants. Additionally, a SNP analysis focused platform was build. It runs a pipeline conforming to the GATK best practice guide and helps users process the results aided by extensive filtering options to interpret and select the most relevant variants.
As should be clear after reading the discussion, several focus areas remain to lift the per- formance and flexibility of the analysis tools and the researchers to the next level. The most important aspects are: building scalable, cost-effective dependable storage systems; optimizing cycle times with better infrastructure and efficient tools improving developer and user produc- tivity; and respecting information security when working with sensitive data. These will be essential drivers of research innovation and novel clinical applications based on sequencing data.}},
  author       = {{Sante, Tom}},
  language     = {{eng}},
  pages        = {{IV, 120}},
  publisher    = {{Ghent University. Faculty of Medicine and Health Sciences}},
  school       = {{Ghent University}},
  title        = {{Development of a sequence analysis pipeline for variant detection and interpretation}},
  year         = {{2015}},
}