Ghent University Academic Bibliography

Advanced

TExSIS: bilingual terminology extraction from parallel corpora using chunk-based alignment

Lieve Macken UGent, Els Lefever UGent and Veronique Hoste UGent (2013) TERMINOLOGY. 19(1). p.1-30
abstract
We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the candidate terms is determined by combining several statistical filters. Although the set-up of the architecture is largely language-independent, we present terminology extraction results for four different languages and three language pairs. Gold standard data sets were created for French-Italian, French-English and French-Dutch, which allowed us not only to evaluate precision, which is common practice, but also recall. We compared the TExSIS approach, which takes a multilingual perspective from the start, with the more commonly used approach of first identifying term candidates monolingually and then aligning the source and target terms. A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. Our results also clearly show that the precision of the alignment is crucial for the success of the terminology extraction. Furthermore, based on the observation that the precision scores for bilingual terminology extraction outperform those of the monolingual systems, we conclude that multilingual evidence helps to determine unithood in less related languages.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
alignment, automatic term extraction, bilingual term extraction, parallel corpora, chunks
journal title
TERMINOLOGY
Terminology
volume
19
issue
1
pages
1 - 30
Web of Science type
Article
Web of Science id
000322837200001
JCR category
LINGUISTICS
JCR impact factor
0.375 (2013)
JCR rank
105/169 (2013)
JCR quartile
3 (2013)
ISSN
0929-9971
DOI
10.1075/term.19.1.01mac
language
English
UGent publication?
yes
classification
A1
copyright statement
I have transferred the copyright for this publication to the publisher
id
2128573
handle
http://hdl.handle.net/1854/LU-2128573
date created
2012-06-01 11:12:00
date last changed
2014-02-05 13:39:01
@article{2128573,
  abstract     = {We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the candidate terms is determined by combining several statistical filters. Although the set-up of the architecture is largely language-independent, we present terminology extraction results for four different languages and three language pairs. Gold standard data sets were created for French-Italian, French-English and French-Dutch, which allowed us not only to evaluate precision, which is common practice, but also recall. 
We compared the TExSIS approach, which takes a multilingual perspective from the start, with the more commonly used approach of first identifying term candidates monolingually and then aligning the source and target terms. A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. Our results also clearly show that the precision of the alignment is crucial for the success of the terminology extraction. Furthermore, based on the observation that the precision scores for bilingual terminology extraction outperform those of the monolingual systems, we conclude that multilingual evidence helps to determine unithood in less related languages.},
  author       = {Macken, Lieve and Lefever, Els and Hoste, Veronique},
  issn         = {0929-9971},
  journal      = {TERMINOLOGY},
  keyword      = {alignment,automatic term extraction,bilingual term extraction,parallel corpora,chunks},
  language     = {eng},
  number       = {1},
  pages        = {1--30},
  title        = {TExSIS: bilingual terminology extraction from parallel corpora using chunk-based alignment},
  url          = {http://dx.doi.org/10.1075/term.19.1.01mac},
  volume       = {19},
  year         = {2013},
}

Chicago
Macken, Lieve, Els Lefever, and Veronique Hoste. 2013. “TExSIS: Bilingual Terminology Extraction from Parallel Corpora Using Chunk-based Alignment.” Terminology 19 (1): 1–30.
APA
Macken, L., Lefever, E., & Hoste, V. (2013). TExSIS: bilingual terminology extraction from parallel corpora using chunk-based alignment. TERMINOLOGY, 19(1), 1–30.
Vancouver
1.
Macken L, Lefever E, Hoste V. TExSIS: bilingual terminology extraction from parallel corpora using chunk-based alignment. TERMINOLOGY. 2013;19(1):1–30.
MLA
Macken, Lieve, Els Lefever, and Veronique Hoste. “TExSIS: Bilingual Terminology Extraction from Parallel Corpora Using Chunk-based Alignment.” TERMINOLOGY 19.1 (2013): 1–30. Print.