Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation
- Author
- Els Lefever (UGent) and Veronique Hoste (UGent)
- Organization
- Abstract
- We present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.
- Keywords
- Word Sense Disambiguation, SemEval, cross-lingual Word Sense Disambiguation, polysemy, parallel corpus, SPECIAL-ISSUE, CORPUS
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 960.34 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-5806135
- MLA
- Lefever, Els, and Veronique Hoste. “Parallel Corpora Make Sense: Bypassing the Knowledge Acquisition Bottleneck for Word Sense Disambiguation.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, vol. 19, no. 3, John Benjamins Publishing Company, 2014, pp. 333–67, doi:10.1075/ijcl.19.3.02lef.
- APA
- Lefever, E., & Hoste, V. (2014). Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 19(3), 333–367. https://doi.org/10.1075/ijcl.19.3.02lef
- Chicago author-date
- Lefever, Els, and Veronique Hoste. 2014. “Parallel Corpora Make Sense: Bypassing the Knowledge Acquisition Bottleneck for Word Sense Disambiguation.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS 19 (3): 333–67. https://doi.org/10.1075/ijcl.19.3.02lef.
- Chicago author-date (all authors)
- Lefever, Els, and Veronique Hoste. 2014. “Parallel Corpora Make Sense: Bypassing the Knowledge Acquisition Bottleneck for Word Sense Disambiguation.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS 19 (3): 333–367. doi:10.1075/ijcl.19.3.02lef.
- Vancouver
- 1.Lefever E, Hoste V. Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS. 2014;19(3):333–67.
- IEEE
- [1]E. Lefever and V. Hoste, “Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation,” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, vol. 19, no. 3, pp. 333–367, 2014.
@article{5806135, abstract = {{We present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.}}, author = {{Lefever, Els and Hoste, Veronique}}, issn = {{1384-6655}}, journal = {{INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS}}, keywords = {{Word Sense Disambiguation,SemEval,cross-lingual Word Sense Disambiguation,polysemy,parallel corpus,SPECIAL-ISSUE,CORPUS}}, language = {{eng}}, number = {{3}}, pages = {{333--367}}, publisher = {{John Benjamins Publishing Company}}, title = {{Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation}}, url = {{http://doi.org/10.1075/ijcl.19.3.02lef}}, volume = {{19}}, year = {{2014}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: