Learning-based detection of scientific terms in patient information
- Author
- Veronique Hoste (UGent) , Els Lefever (UGent) , Klaar Vanopstal (UGent) and Isabelle Delaere (UGent)
- Organization
- Abstract
- In this paper, we investigate the use of a machine-learning based approach to the specific problem of scientific term detection in patient information. Lacking lexical databases which differentiate between the scientific and popular nature of medical terms, we used local context, morphosyntactic, morphological and statistical information to design a learner which accurately detects scientific medical terms. This study is the first step towards the automatic replacement of a scientific term by its popular counterpart, which should have a beneficial effect on readability. We show a F-score of 84% for the prediction of scientific terms in an English and Dutch EPAR corpus. Since recasting the term extraction problem as a classification problem leads to a large skewedness of the resulting data set, we rebalanced the data set through the application of some simple TF-IDF-based and Log-likelihood-based filters. We show that filtering indeed has a beneficial effect on the learner’s performance. However, the results of the filtering approach combined with the learning-based approach remain below those of the learning-based approach.
- Keywords
- automatic term extraction
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 387.37 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-598390
- MLA
- Hoste, Veronique, et al. “Learning-Based Detection of Scientific Terms in Patient Information.” LREC 2008 : Sixth International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA), 2008, pp. 585–91.
- APA
- Hoste, V., Lefever, E., Vanopstal, K., & Delaere, I. (2008). Learning-based detection of scientific terms in patient information. LREC 2008 : Sixth International Conference on Language Resources and Evaluation, 585–591. Paris, France: European Language Resources Association (ELRA).
- Chicago author-date
- Hoste, Veronique, Els Lefever, Klaar Vanopstal, and Isabelle Delaere. 2008. “Learning-Based Detection of Scientific Terms in Patient Information.” In LREC 2008 : Sixth International Conference on Language Resources and Evaluation, 585–91. Paris, France: European Language Resources Association (ELRA).
- Chicago author-date (all authors)
- Hoste, Veronique, Els Lefever, Klaar Vanopstal, and Isabelle Delaere. 2008. “Learning-Based Detection of Scientific Terms in Patient Information.” In LREC 2008 : Sixth International Conference on Language Resources and Evaluation, 585–591. Paris, France: European Language Resources Association (ELRA).
- Vancouver
- 1.Hoste V, Lefever E, Vanopstal K, Delaere I. Learning-based detection of scientific terms in patient information. In: LREC 2008 : sixth international conference on language resources and evaluation. Paris, France: European Language Resources Association (ELRA); 2008. p. 585–91.
- IEEE
- [1]V. Hoste, E. Lefever, K. Vanopstal, and I. Delaere, “Learning-based detection of scientific terms in patient information,” in LREC 2008 : sixth international conference on language resources and evaluation, Marrakech, Morocco, 2008, pp. 585–591.
@inproceedings{598390,
abstract = {{In this paper, we investigate the use of a machine-learning based approach to the specific problem of scientific term detection in patient information. Lacking lexical databases which differentiate between the scientific and popular nature of medical terms, we used local context, morphosyntactic, morphological and statistical information to design a learner which accurately detects scientific medical terms. This study is the first step towards the automatic replacement of a scientific term by its popular counterpart, which should have a beneficial effect on readability. We show a F-score of 84% for the prediction of scientific terms in an English and Dutch EPAR corpus. Since recasting the term extraction problem as a classification problem leads to a large skewedness of the resulting data set, we rebalanced the data set through the application of some simple TF-IDF-based and Log-likelihood-based filters. We show that filtering indeed has a beneficial effect on the learner’s performance. However, the results of the filtering approach combined with the learning-based approach remain below those of the learning-based approach.}},
author = {{Hoste, Veronique and Lefever, Els and Vanopstal, Klaar and Delaere, Isabelle}},
booktitle = {{LREC 2008 : sixth international conference on language resources and evaluation}},
isbn = {{9782951740846}},
keywords = {{automatic term extraction}},
language = {{eng}},
location = {{Marrakech, Morocco}},
pages = {{585--591}},
publisher = {{European Language Resources Association (ELRA)}},
title = {{Learning-based detection of scientific terms in patient information}},
year = {{2008}},
}