LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language
- Author
- Jasper Degraeuwe (UGent) and Patrick Goethals (UGent)
- Organization
- Project
- Abstract
- We present LexComSpaL2, a novel corpus which can be employed to train personalised word-level difficulty classifiers for learners of Spanish as a foreign/second language (L2). The dataset contains 2,240 in-context target words with the corresponding difficulty judgements of 26 Dutch-speaking students who are learning Spanish as an L2, resulting in a total of 58,240 annotations. The target words are divided over 200 sentences from 4 different domains (economics, health, law, and migration) and have been selected based on their suitability to be included in L2 learning materials. As our annotation scheme, we use a customised version of the 5-point lexical complexity prediction scale (Shardlow et al., 2020), tailored to the vocabulary knowledge continuum (which ranges from no knowledge over receptive mastery to productive mastery; Schmitt, 2019). With LexComSpaL2, we aim to address the lack of relevant data for multi-category difficult prediction at word level for L2 learners of other languages than English.
- Keywords
- Lexical Complexity Prediction, Personalisation, Spanish as a Foreign Language
Downloads
-
Degraeuwe Goethals 2024.pdf
- full text (Published version)
- |
- open access
- |
- |
- 535.53 KB
-
(...).pdf
- supplementary material
- |
- UGent only
- |
- |
- 530.74 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01HYGN030GVWB5JPJB7406D8X0
- MLA
- Degraeuwe, Jasper, and Patrick Goethals. “LexComSpaL2 : A Lexical Complexity Corpus for Spanish as a Foreign Language.” Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari et al., ELRA, 2024, pp. 10432–47.
- APA
- Degraeuwe, J., & Goethals, P. (2024). LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 10432–10447). ELRA.
- Chicago author-date
- Degraeuwe, Jasper, and Patrick Goethals. 2024. “LexComSpaL2 : A Lexical Complexity Corpus for Spanish as a Foreign Language.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 10432–47. ELRA.
- Chicago author-date (all authors)
- Degraeuwe, Jasper, and Patrick Goethals. 2024. “LexComSpaL2 : A Lexical Complexity Corpus for Spanish as a Foreign Language.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ed by. Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 10432–10447. ELRA.
- Vancouver
- 1.Degraeuwe J, Goethals P. LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language. In: Calzolari N, Kan M-Y, Hoste V, Lenci A, Sakti S, Xue N, editors. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ELRA; 2024. p. 10432–47.
- IEEE
- [1]J. Degraeuwe and P. Goethals, “LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, Italy, 2024, pp. 10432–10447.
@inproceedings{01HYGN030GVWB5JPJB7406D8X0,
abstract = {{We present LexComSpaL2, a novel corpus which can be employed to train personalised word-level difficulty classifiers for learners of Spanish as a foreign/second language (L2). The dataset contains 2,240 in-context target words with the corresponding difficulty judgements of 26 Dutch-speaking students who are learning Spanish as an L2, resulting in a total of 58,240 annotations. The target words are divided over 200 sentences from 4 different domains (economics, health, law, and migration) and have been selected based on their suitability to be included in L2 learning materials. As our annotation scheme, we use a customised version of the 5-point lexical complexity prediction scale (Shardlow et al., 2020), tailored to the vocabulary knowledge continuum (which ranges from no knowledge over receptive mastery to productive mastery; Schmitt, 2019). With LexComSpaL2, we aim to address the lack of relevant data for multi-category difficult prediction at word level for L2 learners of other languages than English.
}},
author = {{Degraeuwe, Jasper and Goethals, Patrick}},
booktitle = {{Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}},
editor = {{Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}},
isbn = {{9782493814104}},
issn = {{2951-2093}},
keywords = {{Lexical Complexity Prediction,Personalisation,Spanish as a Foreign Language}},
language = {{eng}},
location = {{Turin, Italy}},
pages = {{10432--10447}},
publisher = {{ELRA}},
title = {{LexComSpaL2 : a lexical complexity corpus for Spanish as a foreign language}},
url = {{https://aclanthology.org/2024.lrec-main.912}},
year = {{2024}},
}