Advanced search
1 file | 1.82 MB

Graph-based data selection for the construction of genomic prediction models

Steven Maenhout (UGent) , Bernard De Baets (UGent) and Geert Haesaert (UGent)
(2010) GENETICS. 185(4). p.1463-1475
Author
Organization
Abstract
Efficient genomic selection in animals or crops requires the accurate prediction of the agronomic performance of individuals from their high-density molecular marker profiles. Using a training data set that contains the genotypic and phenotypic information of a large number of individuals, each marker or marker allele is associated with an estimated effect on the trait under study. These estimated marker effects are subsequently used for making predictions on individuals for which no phenotypic records are available. As most plant and animal breeding programs are currently still phenotype driven, the continuously expanding collection of phenotypic records can only be used to construct a genomic prediction model if a dense molecular marker fingerprint is available for each phenotyped individual. However, as the genotyping budget is generally limited, the genomic prediction model can only be constructed using a subset of the tested individuals and possibly a genome-covering subset of the molecular markers. In this article, we demonstrate how an optimal selection of individuals can be made with respect to the quality of their available phenotypic data. We also demonstrate how the total number of molecular markers can be reduced while a maximum genome coverage is ensured. The third selection problem we tackle is specific to the construction of a genomic prediction model for a hybrid breeding program where only molecular marker fingerprints of the homozygous parents are available. We show how to identify the set of parental inbred lines of a predefined size that has produced the highest number of progeny. These three selection approaches are put into practice in a simulation study where we demonstrate how the trade-off between sample size and sample quality affects the prediction accuracy of genomic prediction models for hybrid maize.
Keywords
VECTOR MACHINE REGRESSION, GENETIC EVALUATION, MAXIMUM CLIQUE PROBLEM, SINGLE-CROSS PERFORMANCE, LINEAR-MODELS, MAIZE, INFORMATION, ALGORITHM, CONNECTEDNESS, PRECISION

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.82 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Maenhout, Steven, Bernard De Baets, and Geert Haesaert. 2010. “Graph-based Data Selection for the Construction of Genomic Prediction Models.” Genetics 185 (4): 1463–1475.
APA
Maenhout, S., De Baets, B., & Haesaert, G. (2010). Graph-based data selection for the construction of genomic prediction models. GENETICS, 185(4), 1463–1475.
Vancouver
1.
Maenhout S, De Baets B, Haesaert G. Graph-based data selection for the construction of genomic prediction models. GENETICS. 2010;185(4):1463–75.
MLA
Maenhout, Steven, Bernard De Baets, and Geert Haesaert. “Graph-based Data Selection for the Construction of Genomic Prediction Models.” GENETICS 185.4 (2010): 1463–1475. Print.
@article{1085848,
  abstract     = {Efficient genomic selection in animals or crops requires the accurate prediction of the agronomic performance of individuals from their high-density molecular marker profiles. Using a training data set that contains the genotypic and phenotypic information of a large number of individuals, each marker or marker allele is associated with an estimated effect on the trait under study. These estimated marker effects are subsequently used for making predictions on individuals for which no phenotypic records are available. As most plant and animal breeding programs are currently still phenotype driven, the continuously expanding collection of phenotypic records can only be used to construct a genomic prediction model if a dense molecular marker fingerprint is available for each phenotyped individual. However, as the genotyping budget is generally limited, the genomic prediction model can only be constructed using a subset of the tested individuals and possibly a genome-covering subset of the molecular markers. In this article, we demonstrate how an optimal selection of individuals can be made with respect to the quality of their available phenotypic data. We also demonstrate how the total number of molecular markers can be reduced while a maximum genome coverage is ensured. The third selection problem we tackle is specific to the construction of a genomic prediction model for a hybrid breeding program where only molecular marker fingerprints of the homozygous parents are available. We show how to identify the set of parental inbred lines of a predefined size that has produced the highest number of progeny. These three selection approaches are put into practice in a simulation study where we demonstrate how the trade-off between sample size and sample quality affects the prediction accuracy of genomic prediction models for hybrid maize.},
  author       = {Maenhout, Steven and De Baets, Bernard and Haesaert, Geert},
  issn         = {0016-6731},
  journal      = {GENETICS},
  keyword      = {VECTOR MACHINE REGRESSION,GENETIC EVALUATION,MAXIMUM CLIQUE PROBLEM,SINGLE-CROSS PERFORMANCE,LINEAR-MODELS,MAIZE,INFORMATION,ALGORITHM,CONNECTEDNESS,PRECISION},
  language     = {eng},
  number       = {4},
  pages        = {1463--1475},
  title        = {Graph-based data selection for the construction of genomic prediction models},
  url          = {http://dx.doi.org/10.1534/genetics.110.116426},
  volume       = {185},
  year         = {2010},
}

Altmetric
View in Altmetric
Web of Science
Times cited: