Advanced search
1 file | 112.37 KB Add to list

The trade-off between quantity and quality : comparing a large web corpus and a small focused corpus for medical terminology extraction

Veronique Hoste (UGent) , Klaar Vanopstal (UGent) , Ayla Rigouts Terryn (UGent) and Els Lefever (UGent)
(2019) ACROSS LANGUAGES AND CULTURES. 20(2). p.197-211
Author
Organization
Abstract
We investigate the cost-effectiveness of special-purpose crawled corpora versus more focused corpora for automatic terminology extraction (ATE). Our focus is on medical terminology on heart failure for two languages, viz. English for which we have more web and specialized resources at our disposal and the less resourced Dutch. We show that, although term density in the dedicated corpora is larger for both languages, the potential for term extraction is higher in the crawled corpora than in the dedicated corpora. Furthermore, in a set of experiments in which we evaluate both types of corpora, while keeping size constant, we observe that more Gold Standard (GS) terms are covered by the "noisy" crawled corpus than with a dedicated corpus of the same size.
Keywords
terminology, automatic terminology extraction, corpora, medical terminology

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 112.37 KB

Citation

Please use this url to cite or link to this publication:

MLA
Hoste, Veronique, et al. “The Trade-off between Quantity and Quality : Comparing a Large Web Corpus and a Small Focused Corpus for Medical Terminology Extraction.” ACROSS LANGUAGES AND CULTURES, vol. 20, no. 2, 2019, pp. 197–211, doi:10.1556/084.2019.20.2.3.
APA
Hoste, V., Vanopstal, K., Rigouts Terryn, A., & Lefever, E. (2019). The trade-off between quantity and quality : comparing a large web corpus and a small focused corpus for medical terminology extraction. ACROSS LANGUAGES AND CULTURES, 20(2), 197–211. https://doi.org/10.1556/084.2019.20.2.3
Chicago author-date
Hoste, Veronique, Klaar Vanopstal, Ayla Rigouts Terryn, and Els Lefever. 2019. “The Trade-off between Quantity and Quality : Comparing a Large Web Corpus and a Small Focused Corpus for Medical Terminology Extraction.” ACROSS LANGUAGES AND CULTURES 20 (2): 197–211. https://doi.org/10.1556/084.2019.20.2.3.
Chicago author-date (all authors)
Hoste, Veronique, Klaar Vanopstal, Ayla Rigouts Terryn, and Els Lefever. 2019. “The Trade-off between Quantity and Quality : Comparing a Large Web Corpus and a Small Focused Corpus for Medical Terminology Extraction.” ACROSS LANGUAGES AND CULTURES 20 (2): 197–211. doi:10.1556/084.2019.20.2.3.
Vancouver
1.
Hoste V, Vanopstal K, Rigouts Terryn A, Lefever E. The trade-off between quantity and quality : comparing a large web corpus and a small focused corpus for medical terminology extraction. ACROSS LANGUAGES AND CULTURES. 2019;20(2):197–211.
IEEE
[1]
V. Hoste, K. Vanopstal, A. Rigouts Terryn, and E. Lefever, “The trade-off between quantity and quality : comparing a large web corpus and a small focused corpus for medical terminology extraction,” ACROSS LANGUAGES AND CULTURES, vol. 20, no. 2, pp. 197–211, 2019.
@article{8531641,
  abstract     = {{We investigate the cost-effectiveness of special-purpose crawled corpora versus more focused corpora for automatic terminology extraction (ATE). Our focus is on medical terminology on heart failure for two languages, viz. English for which we have more web and specialized resources at our disposal and the less resourced Dutch. We show that, although term density in the dedicated corpora is larger for both languages, the potential for term extraction is higher in the crawled corpora than in the dedicated corpora. Furthermore, in a set of experiments in which we evaluate both types of corpora, while keeping size constant, we observe that more Gold Standard (GS) terms are covered by the "noisy" crawled corpus than with a dedicated corpus of the same size.}},
  author       = {{Hoste, Veronique and Vanopstal, Klaar and Rigouts Terryn, Ayla and Lefever, Els}},
  issn         = {{1585-1923}},
  journal      = {{ACROSS LANGUAGES AND CULTURES}},
  keywords     = {{terminology,automatic terminology extraction,corpora,medical terminology}},
  language     = {{eng}},
  number       = {{2}},
  pages        = {{197--211}},
  title        = {{The trade-off between quantity and quality : comparing a large web corpus and a small focused corpus for medical terminology extraction}},
  url          = {{http://dx.doi.org/10.1556/084.2019.20.2.3}},
  volume       = {{20}},
  year         = {{2019}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: