Advanced search
2 files | 4.27 MB Add to list

'Cool glasses, where did you get them?' : generating visually grounded conversation starters for human-robot dialogue

Ruben Janssens (UGent) , Pieter Wolfert (UGent) , Thomas Demeester (UGent) and Tony Belpaeme (UGent)
Author
Organization
Project
Abstract
Visually situated language interaction is an important challenge in multi-modal Human-Robot Interaction (HRI). In this context we present a data-driven method to generate situated conversation starters based on visual context. We take visual data about the interactants and generate appropriate greetings for conversational agents in the context of HRI. For this, we constructed a novel open-source data set consisting of 4000 HRI-oriented images of people facing the camera, each augmented by three conversation-starting questions. We compared a baseline retrieval-based model and a generative model. Human evaluation of the models using crowdsourcing shows that the generative model scores best, specifically at correctly referencing visual features. We also investigated how automated metrics can be used as a proxy for human evaluation and found that common automated metrics are a poor substitute for human judgement. Finally, we provide a proof-of-concept demonstrator through an interaction with a Furhat social robot.
Keywords
natural language processing, human-robot interaction, conversational agent, multi-modal dialogue, natural language generation, situatedness, grounding

Downloads

  • DS506 acc.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 2.06 MB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.21 MB

Citation

Please use this url to cite or link to this publication:

MLA
Janssens, Ruben, et al. “’Cool Glasses, Where Did You Get Them?’ : Generating Visually Grounded Conversation Starters for Human-Robot Dialogue.” PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), IEEE Press, 2022, pp. 821–25, doi:10.1109/HRI53351.2022.9889489.
APA
Janssens, R., Wolfert, P., Demeester, T., & Belpaeme, T. (2022). ’Cool glasses, where did you get them?’ : generating visually grounded conversation starters for human-robot dialogue. PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), 821–825. https://doi.org/10.1109/HRI53351.2022.9889489
Chicago author-date
Janssens, Ruben, Pieter Wolfert, Thomas Demeester, and Tony Belpaeme. 2022. “’Cool Glasses, Where Did You Get Them?’ : Generating Visually Grounded Conversation Starters for Human-Robot Dialogue.” In PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), 821–25. IEEE Press. https://doi.org/10.1109/HRI53351.2022.9889489.
Chicago author-date (all authors)
Janssens, Ruben, Pieter Wolfert, Thomas Demeester, and Tony Belpaeme. 2022. “’Cool Glasses, Where Did You Get Them?’ : Generating Visually Grounded Conversation Starters for Human-Robot Dialogue.” In PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), 821–825. IEEE Press. doi:10.1109/HRI53351.2022.9889489.
Vancouver
1.
Janssens R, Wolfert P, Demeester T, Belpaeme T. ’Cool glasses, where did you get them?’ : generating visually grounded conversation starters for human-robot dialogue. In: PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22). IEEE Press; 2022. p. 821–5.
IEEE
[1]
R. Janssens, P. Wolfert, T. Demeester, and T. Belpaeme, “’Cool glasses, where did you get them?’ : generating visually grounded conversation starters for human-robot dialogue,” in PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), Sapporo, Japan, 2022, pp. 821–825.
@inproceedings{8747513,
  abstract     = {{Visually situated language interaction is an important challenge in multi-modal Human-Robot Interaction (HRI). In this context we present a data-driven method to generate situated conversation starters based on visual context. We take visual data about the interactants and generate appropriate greetings for conversational agents in the context of HRI. For this, we constructed a novel open-source data set consisting of 4000 HRI-oriented images of people facing the camera, each augmented by three conversation-starting questions. We compared a baseline retrieval-based model and a generative model. Human evaluation of the models using crowdsourcing shows that the generative model scores best, specifically at correctly referencing visual features. We also investigated how automated metrics can be used as a proxy for human evaluation and found that common automated metrics are a poor substitute for human judgement. Finally, we provide a proof-of-concept demonstrator through an interaction with a Furhat social robot.}},
  author       = {{Janssens, Ruben and Wolfert, Pieter and Demeester, Thomas and Belpaeme, Tony}},
  booktitle    = {{PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22)}},
  isbn         = {{9781665407311}},
  issn         = {{2167-2121}},
  keywords     = {{natural language processing,human-robot interaction,conversational agent,multi-modal dialogue,natural language generation,situatedness,grounding}},
  language     = {{eng}},
  location     = {{Sapporo, Japan}},
  pages        = {{821--825}},
  publisher    = {{IEEE Press}},
  title        = {{'Cool glasses, where did you get them?' : generating visually grounded conversation starters for human-robot dialogue}},
  url          = {{http://doi.org/10.1109/HRI53351.2022.9889489}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: