Advanced search
1 file | 162.18 KB Add to list

Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features

Author
Organization
Abstract
Most research on bilingual automatic term extraction (ATE) from comparable corpora focuses on both components of the task separately, i.e. monolingual automatic term extraction and finding equivalent pairs cross-lingually. The latter usually relies on context vectors and is notoriously inaccurate for infrequent terms. The aim of this pilot study is to investigate whether using information gathered for the former might be beneficial for the cross-lingual linking as well, thereby illustrating the potential of a more holistic approach to ATE from comparable corpora with re-use of information across the components. To test this hypothesis, an existing dataset was expanded, which covers three languages and four domains. A supervised binary classifier is shown to achieve robust performance, with stable results across languages and domains.
Keywords
LT3, automatic term extraction, terminology, Comparable Corpora

Downloads

  • Steyaert Rigouts Terryn 2019 Multilingual Term Extraction from Comparable corpora- Informativeness of Monolingual Term Extraction Features.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 162.18 KB

Citation

Please use this url to cite or link to this publication:

MLA
Steyaert, Kim, and Ayla Rigouts Terryn. “Multilingual Term Extraction from Comparable Corpora : Informativeness of Monolingual Term Extraction Features.” Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, edited by Serge Sharoff et al., 2019, pp. 9–18.
APA
Steyaert, K., & Rigouts Terryn, A. (2019). Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features. In S. Sharoff, P. Zweigenbaum, & R. Rapp (Eds.), Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019 (pp. 9–18). Varna, Bulgaria.
Chicago author-date
Steyaert, Kim, and Ayla Rigouts Terryn. 2019. “Multilingual Term Extraction from Comparable Corpora : Informativeness of Monolingual Term Extraction Features.” In Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, edited by Serge Sharoff, Pierre Zweigenbaum, and Reinhard Rapp, 9–18. Varna, Bulgaria.
Chicago author-date (all authors)
Steyaert, Kim, and Ayla Rigouts Terryn. 2019. “Multilingual Term Extraction from Comparable Corpora : Informativeness of Monolingual Term Extraction Features.” In Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, ed by. Serge Sharoff, Pierre Zweigenbaum, and Reinhard Rapp, 9–18. Varna, Bulgaria.
Vancouver
1.
Steyaert K, Rigouts Terryn A. Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features. In: Sharoff S, Zweigenbaum P, Rapp R, editors. Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019. Varna, Bulgaria; 2019. p. 9–18.
IEEE
[1]
K. Steyaert and A. Rigouts Terryn, “Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features,” in Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019, Varna, Bulgaria, 2019, pp. 9–18.
@inproceedings{8640707,
  abstract     = {{Most research on bilingual automatic term extraction (ATE) from comparable corpora focuses on both components of the task separately, i.e. monolingual automatic term extraction and finding equivalent pairs cross-lingually. The latter usually relies on context vectors and is notoriously inaccurate for infrequent terms. The aim of this pilot study is to investigate whether using information gathered for the former might be beneficial for the cross-lingual linking as well, thereby illustrating the potential of a more holistic approach to ATE from comparable corpora with re-use of information across the components. To test this hypothesis, an existing dataset was expanded, which covers three languages and four domains. A supervised binary classifier is shown to achieve robust performance, with stable results across languages and domains.}},
  author       = {{Steyaert, Kim and Rigouts Terryn, Ayla}},
  booktitle    = {{Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019}},
  editor       = {{Sharoff, Serge and Zweigenbaum, Pierre and Rapp, Reinhard}},
  isbn         = {{NA}},
  keywords     = {{LT3,automatic term extraction,terminology,Comparable Corpora}},
  language     = {{eng}},
  location     = {{Varna, Bulgaria}},
  pages        = {{9--18}},
  title        = {{Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features}},
  url          = {{https://comparable.limsi.fr/bucc2019/Steyaert_BUCC2019_paper2.pdf}},
  year         = {{2019}},
}