Advanced search
1 file | 639.33 KB Add to list

'Am I listening?' : evaluating the quality of generated data-driven listening motion

Author
Organization
Abstract
This paper asks if recent models for generating co-speech gesticulation also may learn to exhibit listening behaviour as well. We consider two models from recent gesture-generation challenges and train them on a dataset of audio and 3D motion capture from dyadic conversations. One model is driven by information from both sides of the conversation, whereas the other only uses the character’s own speech. Several user studies are performed to assess the motion generated when the character is speaking actively, versus when the character is the listener in the conversation. We find that participants are reliably able to discern motion associated with listening, whether from motion capture or generated by the models. Both models are thus able to produce distinctive listening behaviour, even though only one model is truly a listener, in the sense that it has access to information from the other party in the conversation. Additional experiments on both natural and model-generated motion finds motion associated with listening to be rated as less human-like than motion associated with active speaking.

Downloads

  • icmi submission listening behaviour.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 639.33 KB

Citation

Please use this url to cite or link to this publication:

MLA
Wolfert, Pieter, et al. “’Am I Listening?’ : Evaluating the Quality of Generated Data-Driven Listening Motion.” ICMI ’23 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, Association for Computing Machinery (ACM), 2023, pp. 6–10, doi:10.1145/3610661.3617160.
APA
Wolfert, P., Henter, G. E., & Belpaeme, T. (2023). ’Am I listening?’ : evaluating the quality of generated data-driven listening motion. ICMI ’23 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, 6–10. https://doi.org/10.1145/3610661.3617160
Chicago author-date
Wolfert, Pieter, Gustav Eje Henter, and Tony Belpaeme. 2023. “’Am I Listening?’ : Evaluating the Quality of Generated Data-Driven Listening Motion.” In ICMI ’23 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, 6–10. New York: Association for Computing Machinery (ACM). https://doi.org/10.1145/3610661.3617160.
Chicago author-date (all authors)
Wolfert, Pieter, Gustav Eje Henter, and Tony Belpaeme. 2023. “’Am I Listening?’ : Evaluating the Quality of Generated Data-Driven Listening Motion.” In ICMI ’23 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, 6–10. New York: Association for Computing Machinery (ACM). doi:10.1145/3610661.3617160.
Vancouver
1.
Wolfert P, Henter GE, Belpaeme T. ’Am I listening?’ : evaluating the quality of generated data-driven listening motion. In: ICMI ’23 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction. New York: Association for Computing Machinery (ACM); 2023. p. 6–10.
IEEE
[1]
P. Wolfert, G. E. Henter, and T. Belpaeme, “’Am I listening?’ : evaluating the quality of generated data-driven listening motion,” in ICMI ’23 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, Paris, France, 2023, pp. 6–10.
@inproceedings{01HF4Q7521P5877H6PZFEWYSS3,
  abstract     = {{This paper asks if recent models for generating co-speech gesticulation also may learn to exhibit listening behaviour as well. We consider two models from recent gesture-generation challenges and train them on a dataset of audio and 3D motion capture from dyadic conversations. One model is driven by information from both sides of the conversation, whereas the other only uses the character’s own speech. Several user studies are performed to assess the motion generated when the character is speaking actively, versus when the character is the listener in the conversation. We find that participants are reliably able to discern motion associated with listening, whether from motion capture or generated by the models. Both models are thus able to produce distinctive listening behaviour, even though only one model is truly a listener, in the sense that it has access to information from the other party in the conversation. Additional experiments on both natural and model-generated motion finds motion associated with listening to be rated as less human-like than motion associated with active speaking.}},
  author       = {{Wolfert, Pieter and Henter, Gustav Eje and Belpaeme, Tony}},
  booktitle    = {{ICMI '23 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction}},
  isbn         = {{9798400703218}},
  language     = {{eng}},
  location     = {{Paris, France}},
  pages        = {{6--10}},
  publisher    = {{Association for Computing Machinery (ACM)}},
  title        = {{'Am I listening?' : evaluating the quality of generated data-driven listening motion}},
  url          = {{http://doi.org/10.1145/3610661.3617160}},
  year         = {{2023}},
}

Altmetric
View in Altmetric