
Multilingual hybrid automatic term extraction : the use case of ebpracticenet
- Author
- Ayla Rigouts Terryn, Veronique Hoste (UGent) , Joost Buysschaert (UGent) and Els Lefever (UGent)
- Organization
- Abstract
- Accurate terminology is essential for professional communication, but also complex and challenging to translate. To improve multilingual communication, tools have been developed that automatically detect terms and their equivalents in other languages from parallel corpora. By means of a use case with data from ebpracticenet, we illustrate how hybrid multilingual automatic term extraction from parallel corpora works and how it can be used in a practical application such as search engine optimisation. The original aim was to use this list to improve the recall of a search engine by allowing multilingual searches (automatically obtaining search results containing both the original search term and the translations of the search term). Two additional possible applications were found when considering the data. The first addition was searching for related forms, using the automatically generated lemmas to group different forms of the same word. Next, it was found that multiple translations for the same source term reveal clusters of strongly semantically related words (e.g. the Dutch word “gif” is translated as “venom”, “toxin” and “poison”), so these can be used to find relevant documents as well. The ebpracticenet use case clearly illustrates the practical use of automatic terminology extraction from parallel corpora and the benefits of real-world applications to provide inspiration for further research.
- Keywords
- lt3, automatic terminology extraction, ATR, terminology
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 564.31 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8576043
- MLA
- Rigouts Terryn, Ayla, et al. “Multilingual Hybrid Automatic Term Extraction : The Use Case of Ebpracticenet.” Technological Innovation for Specialized Linguistic Domains : Languages for Digital Lives and Cultures, Proceedings of TISLID’18, edited by Timothy Read et al., Editions universitaires européennes, 2018, pp. 151–62.
- APA
- Rigouts Terryn, A., Hoste, V., Buysschaert, J., & Lefever, E. (2018). Multilingual hybrid automatic term extraction : the use case of ebpracticenet. In T. Read, S. Montaner, & B. Sedano (Eds.), Technological innovation for specialized linguistic domains : languages for digital lives and cultures, proceedings of TISLID’18 (pp. 151–162). Editions universitaires européennes.
- Chicago author-date
- Rigouts Terryn, Ayla, Veronique Hoste, Joost Buysschaert, and Els Lefever. 2018. “Multilingual Hybrid Automatic Term Extraction : The Use Case of Ebpracticenet.” In Technological Innovation for Specialized Linguistic Domains : Languages for Digital Lives and Cultures, Proceedings of TISLID’18, edited by Timothy Read, Salvador Montaner, and Beatriz Sedano, 151–62. Editions universitaires européennes.
- Chicago author-date (all authors)
- Rigouts Terryn, Ayla, Veronique Hoste, Joost Buysschaert, and Els Lefever. 2018. “Multilingual Hybrid Automatic Term Extraction : The Use Case of Ebpracticenet.” In Technological Innovation for Specialized Linguistic Domains : Languages for Digital Lives and Cultures, Proceedings of TISLID’18, ed by. Timothy Read, Salvador Montaner, and Beatriz Sedano, 151–162. Editions universitaires européennes.
- Vancouver
- 1.Rigouts Terryn A, Hoste V, Buysschaert J, Lefever E. Multilingual hybrid automatic term extraction : the use case of ebpracticenet. In: Read T, Montaner S, Sedano B, editors. Technological innovation for specialized linguistic domains : languages for digital lives and cultures, proceedings of TISLID’18. Editions universitaires européennes; 2018. p. 151–62.
- IEEE
- [1]A. Rigouts Terryn, V. Hoste, J. Buysschaert, and E. Lefever, “Multilingual hybrid automatic term extraction : the use case of ebpracticenet,” in Technological innovation for specialized linguistic domains : languages for digital lives and cultures, proceedings of TISLID’18, Ghent, Belgium, 2018, pp. 151–162.
@inproceedings{8576043, abstract = {{Accurate terminology is essential for professional communication, but also complex and challenging to translate. To improve multilingual communication, tools have been developed that automatically detect terms and their equivalents in other languages from parallel corpora. By means of a use case with data from ebpracticenet, we illustrate how hybrid multilingual automatic term extraction from parallel corpora works and how it can be used in a practical application such as search engine optimisation. The original aim was to use this list to improve the recall of a search engine by allowing multilingual searches (automatically obtaining search results containing both the original search term and the translations of the search term). Two additional possible applications were found when considering the data. The first addition was searching for related forms, using the automatically generated lemmas to group different forms of the same word. Next, it was found that multiple translations for the same source term reveal clusters of strongly semantically related words (e.g. the Dutch word “gif” is translated as “venom”, “toxin” and “poison”), so these can be used to find relevant documents as well. The ebpracticenet use case clearly illustrates the practical use of automatic terminology extraction from parallel corpora and the benefits of real-world applications to provide inspiration for further research.}}, author = {{Rigouts Terryn, Ayla and Hoste, Veronique and Buysschaert, Joost and Lefever, Els}}, booktitle = {{Technological innovation for specialized linguistic domains : languages for digital lives and cultures, proceedings of TISLID’18}}, editor = {{Read, Timothy and Montaner, Salvador and Sedano, Beatriz}}, isbn = {{9783841784469}}, keywords = {{lt3,automatic terminology extraction,ATR,terminology}}, language = {{eng}}, location = {{Ghent, Belgium}}, pages = {{151--162}}, publisher = {{Editions universitaires européennes}}, title = {{Multilingual hybrid automatic term extraction : the use case of ebpracticenet}}, year = {{2018}}, }