Advanced search
1 file | 915.63 KB Add to list

Novel approaches to assess the quality of fertility data stored in dairy herd management software

Kristof Hermans (UGent) , Willem Waegeman (UGent) , Geert Opsomer (UGent) , Bonifacius Van Ranst (UGent) , Jenne De Koster (UGent) , Mieke Van Eetvelde (UGent) and Miel Hostens (UGent)
(2017) JOURNAL OF DAIRY SCIENCE. 100(5). p.4078-4089
Author
Organization
Abstract
Scientific journals and popular press magazines are littered with articles in which the authors use data from dairy herd management software. Almost none of such papers include data cleaning and data quality assessment in their study design despite this being a very critical step during data mining. This paper presents 2 novel data cleaning methods that permit identification of animals with good and bad data quality. The first; method is a deterministic or rule-based data cleaning method. Reproduction and mutation or life-changing events such as birth and death were converted to a symbolic (alphabetical letter) representation and split into triplets (3-letter code). The triplets were manually labeled as physiologically correct, suspicious, or impossible. The deterministic data cleaning method was applied to assess the quality of data stored in dairy herd management from 26 farms enrolled in the herd health management program from the Faculty of Veterinary Medicine Ghent University, Belgium. In total, 150,443 triplets were created. 65.4% were labeled as correct, 17.4% as suspicious, and 17.2% as impossible. The second method, a probabilistic method, uses a machine learning algorithm (random forests) to predict the correctness of fertility and nmtation events in an early stage of data cleaning. The prediction accuracy of the random forests algorithm was compared with a classical linear statistical method (penalized logistic regression), outperforming the latter substantially, with a superior receiver operating characteristic curve and a higher accuracy (89 vs. 72%). From those results, we conclude that the triplet method can be used to assess the quality of reproduction data stored in dairy herd management software and that a machine learning technique such as random forests is capable of predicting the correctness of fertility data.
Keywords
dairy reproduction, data quality, dairy herd management software, random forests, MACHINE LEARNING ALGORITHMS, SECONDARY DATA SOURCES, REGRESSION TREES, RECORDING-SYSTEM, RANDOM FORESTS, DISEASE, COWS, FRAMEWORK, CLASSIFICATION, PERFORMANCE

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 915.63 KB

Citation

Please use this url to cite or link to this publication:

MLA
Hermans, Kristof, et al. “Novel Approaches to Assess the Quality of Fertility Data Stored in Dairy Herd Management Software.” JOURNAL OF DAIRY SCIENCE, vol. 100, no. 5, 2017, pp. 4078–89, doi:10.3168/jds.2016-11896.
APA
Hermans, K., Waegeman, W., Opsomer, G., Van Ranst, B., De Koster, J., Van Eetvelde, M., & Hostens, M. (2017). Novel approaches to assess the quality of fertility data stored in dairy herd management software. JOURNAL OF DAIRY SCIENCE, 100(5), 4078–4089. https://doi.org/10.3168/jds.2016-11896
Chicago author-date
Hermans, Kristof, Willem Waegeman, Geert Opsomer, Bonifacius Van Ranst, Jenne De Koster, Mieke Van Eetvelde, and Miel Hostens. 2017. “Novel Approaches to Assess the Quality of Fertility Data Stored in Dairy Herd Management Software.” JOURNAL OF DAIRY SCIENCE 100 (5): 4078–89. https://doi.org/10.3168/jds.2016-11896.
Chicago author-date (all authors)
Hermans, Kristof, Willem Waegeman, Geert Opsomer, Bonifacius Van Ranst, Jenne De Koster, Mieke Van Eetvelde, and Miel Hostens. 2017. “Novel Approaches to Assess the Quality of Fertility Data Stored in Dairy Herd Management Software.” JOURNAL OF DAIRY SCIENCE 100 (5): 4078–4089. doi:10.3168/jds.2016-11896.
Vancouver
1.
Hermans K, Waegeman W, Opsomer G, Van Ranst B, De Koster J, Van Eetvelde M, et al. Novel approaches to assess the quality of fertility data stored in dairy herd management software. JOURNAL OF DAIRY SCIENCE. 2017;100(5):4078–89.
IEEE
[1]
K. Hermans et al., “Novel approaches to assess the quality of fertility data stored in dairy herd management software,” JOURNAL OF DAIRY SCIENCE, vol. 100, no. 5, pp. 4078–4089, 2017.
@article{8527176,
  abstract     = {{Scientific journals and popular press magazines are littered with articles in which the authors use data from dairy herd management software. Almost none of such papers include data cleaning and data quality assessment in their study design despite this being a very critical step during data mining. This paper presents 2 novel data cleaning methods that permit identification of animals with good and bad data quality. The first; method is a deterministic or rule-based data cleaning method. Reproduction and mutation or life-changing events such as birth and death were converted to a symbolic (alphabetical letter) representation and split into triplets (3-letter code). The triplets were manually labeled as physiologically correct, suspicious, or impossible. The deterministic data cleaning method was applied to assess the quality of data stored in dairy herd management from 26 farms enrolled in the herd health management program from the Faculty of Veterinary Medicine Ghent University, Belgium. In total, 150,443 triplets were created. 65.4% were labeled as correct, 17.4% as suspicious, and 17.2% as impossible. The second method, a probabilistic method, uses a machine learning algorithm (random forests) to predict the correctness of fertility and nmtation events in an early stage of data cleaning. The prediction accuracy of the random forests algorithm was compared with a classical linear statistical method (penalized logistic regression), outperforming the latter substantially, with a superior receiver operating characteristic curve and a higher accuracy (89 vs. 72%). From those results, we conclude that the triplet method can be used to assess the quality of reproduction data stored in dairy herd management software and that a machine learning technique such as random forests is capable of predicting the correctness of fertility data.}},
  author       = {{Hermans, Kristof and Waegeman, Willem and Opsomer, Geert and Van Ranst, Bonifacius and De Koster, Jenne and Van Eetvelde, Mieke and Hostens, Miel}},
  issn         = {{0022-0302}},
  journal      = {{JOURNAL OF DAIRY SCIENCE}},
  keywords     = {{dairy reproduction,data quality,dairy herd management software,random forests,MACHINE LEARNING ALGORITHMS,SECONDARY DATA SOURCES,REGRESSION TREES,RECORDING-SYSTEM,RANDOM FORESTS,DISEASE,COWS,FRAMEWORK,CLASSIFICATION,PERFORMANCE}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{4078--4089}},
  title        = {{Novel approaches to assess the quality of fertility data stored in dairy herd management software}},
  url          = {{http://doi.org/10.3168/jds.2016-11896}},
  volume       = {{100}},
  year         = {{2017}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: