Advanced search
1 file | 561.62 KB Add to list

Child speech recognition in human-robot interaction : problem solved?

Ruben Janssens (UGent) , Eva Verhelst (UGent) , Giulio Antonio Abbo (UGent) , Qiaoqiao Ren (UGent) , Maria Jose Pinto Bernal (UGent) and Tony Belpaeme (UGent)
Author
Organization
Abstract
Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions.

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 561.62 KB

Citation

Please use this url to cite or link to this publication:

MLA
Janssens, Ruben, et al. “Child Speech Recognition in Human-Robot Interaction : Problem Solved?” TAHRI ’24 : Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction, Association for Computing Machinery (ACM), 2024.
APA
Janssens, R., Verhelst, E., Abbo, G. A., Ren, Q., Pinto Bernal, M. J., & Belpaeme, T. (2024). Child speech recognition in human-robot interaction : problem solved? TAHRI ’24 : Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction. Presented at the TAHRI 2024 : 2024 International Symposium on Technological Advances in Human-Robot Interaction, Boulder, Colorado, USA.
Chicago author-date
Janssens, Ruben, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, and Tony Belpaeme. 2024. “Child Speech Recognition in Human-Robot Interaction : Problem Solved?” In TAHRI ’24 : Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction. Association for Computing Machinery (ACM).
Chicago author-date (all authors)
Janssens, Ruben, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, and Tony Belpaeme. 2024. “Child Speech Recognition in Human-Robot Interaction : Problem Solved?” In TAHRI ’24 : Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction. Association for Computing Machinery (ACM).
Vancouver
1.
Janssens R, Verhelst E, Abbo GA, Ren Q, Pinto Bernal MJ, Belpaeme T. Child speech recognition in human-robot interaction : problem solved? In: TAHRI ’24 : Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction. Association for Computing Machinery (ACM); 2024.
IEEE
[1]
R. Janssens, E. Verhelst, G. A. Abbo, Q. Ren, M. J. Pinto Bernal, and T. Belpaeme, “Child speech recognition in human-robot interaction : problem solved?,” in TAHRI ’24 : Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction, Boulder, Colorado, USA, 2024.
@inproceedings{01HW8GA0SFMR508R0KQM6PA0ER,
  abstract     = {{Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions.}},
  author       = {{Janssens, Ruben and Verhelst, Eva and Abbo, Giulio Antonio and Ren, Qiaoqiao and Pinto Bernal, Maria Jose and Belpaeme, Tony}},
  booktitle    = {{TAHRI '24 : Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction}},
  isbn         = {{9798400716614}},
  language     = {{eng}},
  location     = {{Boulder, Colorado, USA}},
  publisher    = {{Association for Computing Machinery (ACM)}},
  title        = {{Child speech recognition in human-robot interaction : problem solved?}},
  url          = {{https://www.tahri.org/}},
  year         = {{2024}},
}