Advanced search
1 file | 145.23 KB Add to list

Word sense disambiguation for specific purposes : an example sentence-based methodology for Intelligent Computer-Assisted Language Learning

Jasper Degraeuwe (UGent) and Patrick Goethals (UGent)
Author
Organization
Project
Abstract
In this poster, we will present the main challenges of word sense disambiguation (WSD) for the specific purpose of Intelligent Computer-Assisted Language Learning, and report the results of a first experiment. The starting point is that applying WSD could considerably enrich existing language learning and teaching resources, as it would, for instance, enable querying corpora for usage examples of specific semantic uses of vocabulary items. However, most existing WSD methods are based on WordNet and BabelNet sense distinctions, even though their very fine-grained nature actually makes them unsuitable for many NLP applications (Hovy et al., 2013). Moreover, it is argued that a single set of word senses is unlikely to be appropriate for different NLP applications, since “different corpora, and different purposes, will lead to different senses” (Kilgarriff, 1997). In other words, WSD for specific purposes is an open problem which requires research into tailoring the sense inventory to the particularities of the specific purpose, and into designing methodologies which require little human-curated input (Degraeuwe et al., in press). For this latter challenge, word embeddings could be a key factor, since the surge of neural networks and along with it the introduction of static (e.g. word2vec [Mikolov et al., 2013]) and especially contextualised word embedding models (e.g. BERT [Devlin et al., 2019]) meant an important breakthrough for the WSD task, pushing performance levels to new heights (Loureiro et al., 2021). In a first experiment (focused on Spanish as a foreign language), we developed a customised, coarse-grained sense inventory in which the senses are represented by prototypical usage examples, and then used a pretrained BERT model to convert those sentences into “sense embeddings” and predict the sense of unseen ambiguous instances through cosine similarity calculations. On a 25-item lexical sample, this methodology achieves a promising average F1 score of 0.9.

Downloads

  • JasperDegraeuwe PatrickGoethals CLIN31 poster.pdf
    • full text (Author's original)
    • |
    • open access
    • |
    • PDF
    • |
    • 145.23 KB

Citation

Please use this url to cite or link to this publication:

MLA
Degraeuwe, Jasper, and Patrick Goethals. “Word Sense Disambiguation for Specific Purposes : An Example Sentence-Based Methodology for Intelligent Computer-Assisted Language Learning.” 31st Meeting of Computational Linguistics in The Netherlands, Abstracts, 2021.
APA
Degraeuwe, J., & Goethals, P. (2021). Word sense disambiguation for specific purposes : an example sentence-based methodology for Intelligent Computer-Assisted Language Learning. 31st Meeting of Computational Linguistics in The Netherlands, Abstracts. Presented at the 31st Meeting of Computational Linguistics In the Netherlands (CLIN 31), Ghent, Belgium.
Chicago author-date
Degraeuwe, Jasper, and Patrick Goethals. 2021. “Word Sense Disambiguation for Specific Purposes : An Example Sentence-Based Methodology for Intelligent Computer-Assisted Language Learning.” In 31st Meeting of Computational Linguistics in The Netherlands, Abstracts.
Chicago author-date (all authors)
Degraeuwe, Jasper, and Patrick Goethals. 2021. “Word Sense Disambiguation for Specific Purposes : An Example Sentence-Based Methodology for Intelligent Computer-Assisted Language Learning.” In 31st Meeting of Computational Linguistics in The Netherlands, Abstracts.
Vancouver
1.
Degraeuwe J, Goethals P. Word sense disambiguation for specific purposes : an example sentence-based methodology for Intelligent Computer-Assisted Language Learning. In: 31st Meeting of Computational Linguistics in The Netherlands, Abstracts. 2021.
IEEE
[1]
J. Degraeuwe and P. Goethals, “Word sense disambiguation for specific purposes : an example sentence-based methodology for Intelligent Computer-Assisted Language Learning,” in 31st Meeting of Computational Linguistics in The Netherlands, Abstracts, Ghent, Belgium, 2021.
@inproceedings{01JD7FMT8AQC0Z4N4SH98MN8KH,
  abstract     = {{In this poster, we will present the main challenges of word sense disambiguation (WSD) for the specific purpose of Intelligent Computer-Assisted Language Learning, and report the results of a first experiment. The starting point is that applying WSD could considerably enrich existing language learning and teaching resources, as it would, for instance, enable querying corpora for usage examples of specific semantic uses of vocabulary items. However, most existing WSD methods are based on WordNet and BabelNet sense distinctions, even though their very fine-grained nature actually makes them unsuitable for many NLP applications (Hovy et al., 2013). Moreover, it is argued that a single set of word senses is unlikely to be appropriate for different NLP applications, since “different corpora, and different purposes, will lead to different senses” (Kilgarriff, 1997).
In other words, WSD for specific purposes is an open problem which requires research into tailoring the sense inventory to the particularities of the specific purpose, and into designing methodologies which require little human-curated input (Degraeuwe et al., in press). For this latter challenge, word embeddings could be a key factor, since the surge of neural networks and along with it the introduction of static (e.g. word2vec [Mikolov et al., 2013]) and especially contextualised word embedding models (e.g. BERT [Devlin et al., 2019]) meant an important breakthrough for the WSD task, pushing performance levels to new heights (Loureiro et al., 2021). In a first experiment (focused on Spanish as a foreign language), we developed a customised, coarse-grained sense inventory in which the senses are represented by prototypical usage examples, and then used a pretrained BERT model to convert those sentences into “sense embeddings” and predict the sense of unseen ambiguous instances through cosine similarity calculations. On a 25-item lexical sample, this methodology achieves a promising average F1 score of 0.9.}},
  author       = {{Degraeuwe, Jasper and Goethals, Patrick}},
  booktitle    = {{31st Meeting of Computational Linguistics in The Netherlands, Abstracts}},
  language     = {{eng}},
  location     = {{Ghent, Belgium}},
  title        = {{Word sense disambiguation for specific purposes : an example sentence-based methodology for Intelligent Computer-Assisted Language Learning}},
  year         = {{2021}},
}