Advanced search
1 file | 43.50 MB Add to list

Explicit sequence-culture-strain-taxon links in StrainInfo and their role in quality assessment and assurance

Wim De Smet (UGent)
(2013)
Author
Promoter
(UGent) , (UGent) and (UGent)
Organization
Abstract
The exchange of research material is an important part of scientific tradition in the biological sciences. Only by exchanging the original material can researchers repeat and build on previous work. Microorganisms such as Bacteria and Archaea have been exchanged in this way since their discovery, and the later discovery of methods to grow and store these organisms in a laboratory environment in so-called pure cultures have only strengthened that tradition. Research in Fungi, another large group of of microorganisms that includes, among many others, the various yeasts, has also greatly benefited from the existence of collections built by researchers over the years. Much of the existing known diversity is located in private research collections.Growing and distributing biological material, however, requires a significant investment of time and money. As a result, public service collections for microbial material were soon created. These collections are now largely responsible for preserving and making available the known microbial diversity. Such so-called Biological Resource Centers (BRCs) often allow other entities (including researchers and companies) to deposit new cultures of microbial strains into their collection. Each deposit is assigned a standard number, called a strain number. Researchers can at any time obtain subcultures of publicly available microbial strains from obtain a BRC (for a fee), wherein the strain number serves as a to uniquely identify the desired strain. As part of this dissertation, we describe the creation of StrainInfo, a strain database formed by extracting all known strain numbers and explicitly linking them to the information found in other databases, StrainInfo has created a so-called ‘strain passport’, which lists all known strain numbers of the cultures of a strain, along with all sequences and publications known to be derived from it. The extra information is often interesting in its own right and is thus also afforded a ‘publication passport’ (for publications), ‘taxon passport’ (for taxonomic names), or ‘sequence passport’ (for genetic sequences). On each of these pages the associated publications, strains or sequences are shown, providing the microbiologist with a clear, easily navigable overview of microbial strain related information. In addition to the base database, this work describes the creation of the Genomic Rosetta Stone project, a cooperation between several data providers of genome-related information, integrated into StrainInfo as a 'genome passport'. Researchers often need to choose representative 16S rRNA gene sequences type strains of several taxonomic groups. The final part of this thesis describes the SeqRank workflow, integrated with StrainInfo, which allows them to accomplish this task by applying a ranking algorithm called SeqRank to the 16S rRNA gene sequences of the type strain of each species or sub-species found within a taxonomic group at the rank of genus or above. The workflow can be used to quickly generate or update lists of representative sequences. The explicit connections between taxon, culture, strain and taxon provided by StrainInfo also enable the use of the SeqRank workflow as a tool for finding errors in existing collections of 16S rRNA gene sequences. The SeqRank workflow operates fully automatically, and its different criteria are made explicit and adjustable. These qualities make the workflow inherently scalable, even when the ever increasing volume of publicly available sequence data will eventually preclude the manual execution of such tasks.

Downloads

  • PhD.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 43.50 MB

Citation

Please use this url to cite or link to this publication:

MLA
De Smet, Wim. Explicit Sequence-Culture-Strain-Taxon Links in StrainInfo and Their Role in Quality Assessment and Assurance. Ghent University. Faculty of Sciences, 2013.
APA
De Smet, W. (2013). Explicit sequence-culture-strain-taxon links in StrainInfo and their role in quality assessment and assurance. Ghent University. Faculty of Sciences, Ghent, Belgium.
Chicago author-date
De Smet, Wim. 2013. “Explicit Sequence-Culture-Strain-Taxon Links in StrainInfo and Their Role in Quality Assessment and Assurance.” Ghent, Belgium: Ghent University. Faculty of Sciences.
Chicago author-date (all authors)
De Smet, Wim. 2013. “Explicit Sequence-Culture-Strain-Taxon Links in StrainInfo and Their Role in Quality Assessment and Assurance.” Ghent, Belgium: Ghent University. Faculty of Sciences.
Vancouver
1.
De Smet W. Explicit sequence-culture-strain-taxon links in StrainInfo and their role in quality assessment and assurance. [Ghent, Belgium]: Ghent University. Faculty of Sciences; 2013.
IEEE
[1]
W. De Smet, “Explicit sequence-culture-strain-taxon links in StrainInfo and their role in quality assessment and assurance,” Ghent University. Faculty of Sciences, Ghent, Belgium, 2013.
@phdthesis{4216826,
  abstract     = {{The exchange of research material is an important part of scientific tradition in the biological sciences. Only by exchanging the original material can researchers repeat and build on previous work. Microorganisms such as Bacteria and Archaea have been exchanged in this way since their discovery, and the later discovery of methods to grow and store these organisms in a laboratory environment in so-called pure cultures have only strengthened that tradition. Research in Fungi, another large group of of microorganisms that includes, among many others, the various yeasts, has also greatly benefited from the existence of collections built by researchers over the years. Much of the existing known diversity is located in private research collections.Growing and distributing biological material, however, requires a significant investment of time and money. As a result, public service collections for microbial material were soon created. These collections are now largely responsible for preserving and making available the known microbial diversity. Such so-called Biological Resource Centers (BRCs) often allow other entities (including researchers and companies) to deposit new cultures of microbial strains into their collection. Each deposit is assigned a standard number, called a strain number. Researchers can at any time obtain subcultures of publicly available microbial strains from obtain a BRC (for a fee), wherein the strain number serves as a to uniquely identify the desired strain.
As part of this dissertation, we describe the creation of StrainInfo, a strain database formed by extracting all known strain numbers and explicitly linking them to the information found in other databases, StrainInfo has created a so-called ‘strain passport’, which lists all known strain numbers of the cultures of a strain, along with all sequences and publications known to be derived from it. The extra information is often interesting in its own right and is thus also afforded a ‘publication passport’ (for publications), ‘taxon passport’ (for taxonomic names), or ‘sequence passport’ (for genetic sequences). On each of these pages the associated publications, strains or sequences are shown, providing the microbiologist with a clear, easily navigable overview of microbial strain related information. In addition to the base database, this work describes the creation of the Genomic Rosetta Stone project, a cooperation between several data providers of genome-related information, integrated into StrainInfo as a 'genome passport'.
Researchers often need to choose representative 16S rRNA gene sequences type strains of several taxonomic groups. The final part of this thesis describes the SeqRank workflow, integrated with StrainInfo, which allows them to accomplish this task by applying a ranking algorithm called SeqRank to the 16S rRNA gene sequences of the type strain of each species or sub-species found within a taxonomic group at the rank of genus or above. The workflow can be used to quickly generate or update lists of representative sequences. The explicit connections between taxon, culture, strain and taxon provided by StrainInfo also enable the use of the SeqRank workflow as a tool for
finding errors in existing collections of 16S rRNA gene sequences. The SeqRank workflow operates fully automatically, and its different criteria are made explicit and adjustable. These qualities make the workflow inherently scalable, even when the ever increasing volume of publicly available sequence data will eventually preclude the manual execution of such tasks.}},
  author       = {{De Smet, Wim}},
  isbn         = {{9789461971623}},
  language     = {{eng}},
  pages        = {{186}},
  publisher    = {{Ghent University. Faculty of Sciences}},
  school       = {{Ghent University}},
  title        = {{Explicit sequence-culture-strain-taxon links in StrainInfo and their role in quality assessment and assurance}},
  year         = {{2013}},
}