Challenges with sign language datasets for sign language recognition and translation
- Author
- Mirella De Sisto, Vincent Vandeghinste, Santiago Egea Gomez, Mathieu De Coster (UGent) , Dimitar Shterionov and Horacio Saggion
- Organization
- Project
- Abstract
- Sign Languages (SLs) are the primary means of communication for at least half a million people in Europe alone. However, the development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.
- Keywords
- sign language translation, sign language recognition, sign language corpora, unified data format, machine learning
Downloads
-
DS528.pdf
- full text (Published version)
- |
- open access
- |
- |
- 1.63 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8756877
- MLA
- De Sisto, Mirella, et al. “Challenges with Sign Language Datasets for Sign Language Recognition and Translation.” PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2022), edited by Nicoletta Calzolari et al., European Language Resources Association (ELRA), 2022, pp. 2478–87.
- APA
- De Sisto, M., Vandeghinste, V., Gomez, S. E., De Coster, M., Shterionov, D., & Saggion, H. (2022). Challenges with sign language datasets for sign language recognition and translation. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, … S. Piperidis (Eds.), PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2022) (pp. 2478–2487). Marseille, France: European Language Resources Association (ELRA).
- Chicago author-date
- De Sisto, Mirella, Vincent Vandeghinste, Santiago Egea Gomez, Mathieu De Coster, Dimitar Shterionov, and Horacio Saggion. 2022. “Challenges with Sign Language Datasets for Sign Language Recognition and Translation.” In PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2022), edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, et al., 2478–87. Marseille, France: European Language Resources Association (ELRA).
- Chicago author-date (all authors)
- De Sisto, Mirella, Vincent Vandeghinste, Santiago Egea Gomez, Mathieu De Coster, Dimitar Shterionov, and Horacio Saggion. 2022. “Challenges with Sign Language Datasets for Sign Language Recognition and Translation.” In PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2022), ed by. Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélane Mazo, Jan Odijk, and Stelios Piperidis, 2478–2487. Marseille, France: European Language Resources Association (ELRA).
- Vancouver
- 1.De Sisto M, Vandeghinste V, Gomez SE, De Coster M, Shterionov D, Saggion H. Challenges with sign language datasets for sign language recognition and translation. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, et al., editors. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2022). Marseille, France: European Language Resources Association (ELRA); 2022. p. 2478–87.
- IEEE
- [1]M. De Sisto, V. Vandeghinste, S. E. Gomez, M. De Coster, D. Shterionov, and H. Saggion, “Challenges with sign language datasets for sign language recognition and translation,” in PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2022), Marseille, France, 2022, pp. 2478–2487.
@inproceedings{8756877,
abstract = {{Sign Languages (SLs) are the primary means of communication for at least half a million people in Europe alone. However, the development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.}},
author = {{De Sisto, Mirella and Vandeghinste, Vincent and Gomez, Santiago Egea and De Coster, Mathieu and Shterionov, Dimitar and Saggion, Horacio}},
booktitle = {{PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2022)}},
editor = {{Calzolari, Nicoletta and Béchet, Frédéric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, Hélane and Odijk, Jan and Piperidis, Stelios}},
isbn = {{9791095546726}},
issn = {{2522-2686}},
keywords = {{sign language translation,sign language recognition,sign language corpora,unified data format,machine learning}},
language = {{eng}},
location = {{Marseille, France}},
pages = {{2478--2487}},
publisher = {{European Language Resources Association (ELRA)}},
title = {{Challenges with sign language datasets for sign language recognition and translation}},
url = {{http://www.lrec-conf.org/proceedings/lrec2022/index.html}},
year = {{2022}},
}