Child speech recognition in human-robot interaction : problem solved?
(2025)
SOCIAL ROBOTICS, ICSR + AI 2024, PT II.
In Lecture notes in computer science
15562.
p.476-486
- Author
- Ruben Janssens (UGent) , Eva Verhelst (UGent) , Giulio Antonio Abbo (UGent) , Qiaoqiao Ren (UGent) , Maria Jose Pinto Bernal (UGent) and Tony Belpaeme (UGent)
- Organization
- Project
- Abstract
- Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children’s speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. Performance improves even more in highly structured interactions when priming models with specific phrases. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions.
- Keywords
- Child-Robot Interaction, Automatic Speech Recognition, Verbal Interaction, Interaction Design Recommendations
Downloads
-
(...).pdf
- full text (Accepted manuscript)
- |
- UGent only (changes to open access on 2026-03-26)
- |
- |
- 452.03 KB
-
(...).pdf
- full text (Accepted manuscript)
- |
- UGent only
- |
- |
- 2.57 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01JSES8A6NTEJNF2934FV34MXF
- MLA
- Janssens, Ruben, et al. “Child Speech Recognition in Human-Robot Interaction : Problem Solved?” SOCIAL ROBOTICS, ICSR + AI 2024, PT II, edited by Oskar [missing] et al., vol. 15562, Springer, 2025, pp. 476–86, doi:10.1007/978-981-96-3519-1_43.
- APA
- Janssens, R., Verhelst, E., Abbo, G. A., Ren, Q., Pinto Bernal, M. J., & Belpaeme, T. (2025). Child speech recognition in human-robot interaction : problem solved? In O. [missing], L. Bodenhagen, J.-J. Cabibihan, K. Fischer, S. Šabanović, K. Winkle, … H. He (Eds.), SOCIAL ROBOTICS, ICSR + AI 2024, PT II (Vol. 15562, pp. 476–486). https://doi.org/10.1007/978-981-96-3519-1_43
- Chicago author-date
- Janssens, Ruben, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, and Tony Belpaeme. 2025. “Child Speech Recognition in Human-Robot Interaction : Problem Solved?” In SOCIAL ROBOTICS, ICSR + AI 2024, PT II, edited by Oskar [missing], Leon Bodenhagen, John-John Cabibihan, Kerstin Fischer, Selma Šabanović, Katie Winkle, Laxmidhar Behera, et al., 15562:476–86. Singapore: Springer. https://doi.org/10.1007/978-981-96-3519-1_43.
- Chicago author-date (all authors)
- Janssens, Ruben, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, and Tony Belpaeme. 2025. “Child Speech Recognition in Human-Robot Interaction : Problem Solved?” In SOCIAL ROBOTICS, ICSR + AI 2024, PT II, ed by. Oskar [missing], Leon Bodenhagen, John-John Cabibihan, Kerstin Fischer, Selma Šabanović, Katie Winkle, Laxmidhar Behera, Shuzhi Sam Ge, Dimitrios Chrysostomou, Wanyue Jiang, and Hongsheng He, 15562:476–486. Singapore: Springer. doi:10.1007/978-981-96-3519-1_43.
- Vancouver
- 1.Janssens R, Verhelst E, Abbo GA, Ren Q, Pinto Bernal MJ, Belpaeme T. Child speech recognition in human-robot interaction : problem solved? In: [missing] O, Bodenhagen L, Cabibihan J-J, Fischer K, Šabanović S, Winkle K, et al., editors. SOCIAL ROBOTICS, ICSR + AI 2024, PT II. Singapore: Springer; 2025. p. 476–86.
- IEEE
- [1]R. Janssens, E. Verhelst, G. A. Abbo, Q. Ren, M. J. Pinto Bernal, and T. Belpaeme, “Child speech recognition in human-robot interaction : problem solved?,” in SOCIAL ROBOTICS, ICSR + AI 2024, PT II, Odense, Denmark, 2025, vol. 15562, pp. 476–486.
@inproceedings{01JSES8A6NTEJNF2934FV34MXF,
abstract = {{Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children’s speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. Performance improves even more in highly structured interactions when priming models with specific phrases. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions.}},
author = {{Janssens, Ruben and Verhelst, Eva and Abbo, Giulio Antonio and Ren, Qiaoqiao and Pinto Bernal, Maria Jose and Belpaeme, Tony}},
booktitle = {{SOCIAL ROBOTICS, ICSR + AI 2024, PT II}},
editor = {{[missing], Oskar and Bodenhagen, Leon and Cabibihan, John-John and Fischer, Kerstin and Šabanović, Selma and Winkle, Katie and Behera, Laxmidhar and Sam Ge, Shuzhi and Chrysostomou, Dimitrios and Jiang, Wanyue and He, Hongsheng}},
isbn = {{9789819635184}},
issn = {{0302-9743}},
keywords = {{Child-Robot Interaction,Automatic Speech Recognition,Verbal Interaction,Interaction Design Recommendations}},
language = {{eng}},
location = {{Odense, Denmark}},
pages = {{476--486}},
publisher = {{Springer}},
title = {{Child speech recognition in human-robot interaction : problem solved?}},
url = {{http://doi.org/10.1007/978-981-96-3519-1_43}},
volume = {{15562}},
year = {{2025}},
}
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: