Advanced search
2 files | 1.07 MB Add to list
Author
Organization
Project
Abstract
We present LexComSpaL2, a novel corpus which can be employed to train personalised word-level difficulty classifiers for learners of Spanish as a foreign/second language (L2). The dataset contains 2,240 in-context target words with the corresponding difficulty judgements of 26 Dutch-speaking students who are learning Spanish as an L2, resulting in a total of 58,240 annotations. The target words are divided over 200 sentences from 4 different domains (economics, health, law, and migration) and have been selected based on their suitability to be included in L2 learning materials. As our annotation scheme, we use a customised version of the 5-point lexical complexity prediction scale (Shardlow et al., 2020), tailored to the vocabulary knowledge continuum (which ranges from no knowledge over receptive mastery to productive mastery; Schmitt, 2019). With LexComSpaL2, we aim to address the lack of relevant data for multi-category difficult prediction at word level for L2 learners of other languages than English.
Keywords
Lexical Complexity Prediction, Personalisation, Spanish as a Foreign Language

Downloads

  • Degraeuwe Goethals 2024.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 535.53 KB
  • (...).pdf
    • supplementary material
    • |
    • UGent only
    • |
    • PDF
    • |
    • 530.74 KB

Citation

Please use this url to cite or link to this publication:

MLA
Degraeuwe, Jasper, and Patrick Goethals. “LexComSpaL2 : A Lexical Complexity Corpus for Spanish as a Foreign Language.” Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari et al., ELRA, 2024, pp. 10432–47.
APA
Degraeuwe, J., & Goethals, P. (2024). LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 10432–10447). ELRA.
Chicago author-date
Degraeuwe, Jasper, and Patrick Goethals. 2024. “LexComSpaL2 : A Lexical Complexity Corpus for Spanish as a Foreign Language.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 10432–47. ELRA.
Chicago author-date (all authors)
Degraeuwe, Jasper, and Patrick Goethals. 2024. “LexComSpaL2 : A Lexical Complexity Corpus for Spanish as a Foreign Language.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ed by. Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 10432–10447. ELRA.
Vancouver
1.
Degraeuwe J, Goethals P. LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language. In: Calzolari N, Kan M-Y, Hoste V, Lenci A, Sakti S, Xue N, editors. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA; 2024. p. 10432–47.
IEEE
[1]
J. Degraeuwe and P. Goethals, “LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, Italy, 2024, pp. 10432–10447.
@inproceedings{01HYGN030GVWB5JPJB7406D8X0,
  abstract     = {{We present LexComSpaL2, a novel corpus which can be employed to train personalised word-level difficulty classifiers for learners of Spanish as a foreign/second language (L2). The dataset contains 2,240 in-context target words with the corresponding difficulty judgements of 26 Dutch-speaking students who are learning Spanish as an L2, resulting in a total of 58,240 annotations. The target words are divided over 200 sentences from 4 different domains (economics, health, law, and migration) and have been selected based on their suitability to be included in L2 learning materials. As our annotation scheme, we use a customised version of the 5-point lexical complexity prediction scale (Shardlow et al., 2020), tailored to the vocabulary knowledge continuum (which ranges from no knowledge over receptive mastery to productive mastery; Schmitt, 2019). With LexComSpaL2, we aim to address the lack of relevant data for multi-category difficult prediction at word level for L2 learners of other languages than English.
}},
  author       = {{Degraeuwe, Jasper and Goethals, Patrick}},
  booktitle    = {{Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}},
  editor       = {{Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}},
  isbn         = {{9782493814104}},
  issn         = {{2951-2093}},
  keywords     = {{Lexical Complexity Prediction,Personalisation,Spanish as a Foreign Language}},
  language     = {{eng}},
  location     = {{Turin, Italy}},
  pages        = {{10432--10447}},
  publisher    = {{ELRA}},
  title        = {{LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language}},
  url          = {{https://aclanthology.org/2024.lrec-main.912}},
  year         = {{2024}},
}