
Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features
- Author
- Kim Steyaert and Ayla Rigouts Terryn
- Organization
- Abstract
- Most research on bilingual automatic term extraction (ATE) from comparable corpora focuses on both components of the task separately, i.e. monolingual automatic term extraction and finding equivalent pairs cross-lingually. The latter usually relies on context vectors and is notoriously inaccurate for infrequent terms. The aim of this pilot study is to investigate whether using information gathered for the former might be beneficial for the cross-lingual linking as well, thereby illustrating the potential of a more holistic approach to ATE from comparable corpora with re-use of information across the components. To test this hypothesis, an existing dataset was expanded, which covers three languages and four domains. A supervised binary classifier is shown to achieve robust performance, with stable results across languages and domains.
- Keywords
- LT3, automatic term extraction, terminology, Comparable Corpora
Downloads
-
Steyaert Rigouts Terryn 2019 Multilingual Term Extraction from Comparable corpora- Informativeness of Monolingual Term Extraction Features.pdf
- full text
- |
- open access
- |
- |
- 162.18 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8640707
- MLA
- Steyaert, Kim, and Ayla Rigouts Terryn. “Multilingual Term Extraction from Comparable Corpora : Informativeness of Monolingual Term Extraction Features.” Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, edited by Serge Sharoff et al., 2019, pp. 9–18.
- APA
- Steyaert, K., & Rigouts Terryn, A. (2019). Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features. In S. Sharoff, P. Zweigenbaum, & R. Rapp (Eds.), Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019 (pp. 9–18). Varna, Bulgaria.
- Chicago author-date
- Steyaert, Kim, and Ayla Rigouts Terryn. 2019. “Multilingual Term Extraction from Comparable Corpora : Informativeness of Monolingual Term Extraction Features.” In Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, edited by Serge Sharoff, Pierre Zweigenbaum, and Reinhard Rapp, 9–18. Varna, Bulgaria.
- Chicago author-date (all authors)
- Steyaert, Kim, and Ayla Rigouts Terryn. 2019. “Multilingual Term Extraction from Comparable Corpora : Informativeness of Monolingual Term Extraction Features.” In Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, ed by. Serge Sharoff, Pierre Zweigenbaum, and Reinhard Rapp, 9–18. Varna, Bulgaria.
- Vancouver
- 1.Steyaert K, Rigouts Terryn A. Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features. In: Sharoff S, Zweigenbaum P, Rapp R, editors. Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019. Varna, Bulgaria; 2019. p. 9–18.
- IEEE
- [1]K. Steyaert and A. Rigouts Terryn, “Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features,” in Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, Varna, Bulgaria, 2019, pp. 9–18.
@inproceedings{8640707, abstract = {{Most research on bilingual automatic term extraction (ATE) from comparable corpora focuses on both components of the task separately, i.e. monolingual automatic term extraction and finding equivalent pairs cross-lingually. The latter usually relies on context vectors and is notoriously inaccurate for infrequent terms. The aim of this pilot study is to investigate whether using information gathered for the former might be beneficial for the cross-lingual linking as well, thereby illustrating the potential of a more holistic approach to ATE from comparable corpora with re-use of information across the components. To test this hypothesis, an existing dataset was expanded, which covers three languages and four domains. A supervised binary classifier is shown to achieve robust performance, with stable results across languages and domains.}}, author = {{Steyaert, Kim and Rigouts Terryn, Ayla}}, booktitle = {{Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019}}, editor = {{Sharoff, Serge and Zweigenbaum, Pierre and Rapp, Reinhard}}, isbn = {{NA}}, keywords = {{LT3,automatic term extraction,terminology,Comparable Corpora}}, language = {{eng}}, location = {{Varna, Bulgaria}}, pages = {{9--18}}, title = {{Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features}}, url = {{https://comparable.limsi.fr/bucc2019/Steyaert_BUCC2019_paper2.pdf}}, year = {{2019}}, }