Advanced search
1 file | 960.34 KB Add to list

Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation

Els Lefever (UGent) and Veronique Hoste (UGent)
Author
Organization
Abstract
We present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.
Keywords
Word Sense Disambiguation, SemEval, cross-lingual Word Sense Disambiguation, polysemy, parallel corpus, SPECIAL-ISSUE, CORPUS

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 960.34 KB

Citation

Please use this url to cite or link to this publication:

MLA
Lefever, Els, and Veronique Hoste. “Parallel Corpora Make Sense: Bypassing the Knowledge Acquisition Bottleneck for Word Sense Disambiguation.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, vol. 19, no. 3, John Benjamins Publishing Company, 2014, pp. 333–67, doi:10.1075/ijcl.19.3.02lef.
APA
Lefever, E., & Hoste, V. (2014). Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 19(3), 333–367. https://doi.org/10.1075/ijcl.19.3.02lef
Chicago author-date
Lefever, Els, and Veronique Hoste. 2014. “Parallel Corpora Make Sense: Bypassing the Knowledge Acquisition Bottleneck for Word Sense Disambiguation.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS 19 (3): 333–67. https://doi.org/10.1075/ijcl.19.3.02lef.
Chicago author-date (all authors)
Lefever, Els, and Veronique Hoste. 2014. “Parallel Corpora Make Sense: Bypassing the Knowledge Acquisition Bottleneck for Word Sense Disambiguation.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS 19 (3): 333–367. doi:10.1075/ijcl.19.3.02lef.
Vancouver
1.
Lefever E, Hoste V. Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS. 2014;19(3):333–67.
IEEE
[1]
E. Lefever and V. Hoste, “Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation,” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, vol. 19, no. 3, pp. 333–367, 2014.
@article{5806135,
  abstract     = {{We present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.}},
  author       = {{Lefever, Els and Hoste, Veronique}},
  issn         = {{1384-6655}},
  journal      = {{INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS}},
  keywords     = {{Word Sense Disambiguation,SemEval,cross-lingual Word Sense Disambiguation,polysemy,parallel corpus,SPECIAL-ISSUE,CORPUS}},
  language     = {{eng}},
  number       = {{3}},
  pages        = {{333--367}},
  publisher    = {{John Benjamins Publishing Company}},
  title        = {{Parallel corpora make sense: bypassing the knowledge acquisition bottleneck for word sense disambiguation}},
  url          = {{http://doi.org/10.1075/ijcl.19.3.02lef}},
  volume       = {{19}},
  year         = {{2014}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: