'Cool glasses, where did you get them?' : generating visually grounded conversation starters for human-robot dialogue
- Author
- Ruben Janssens (UGent) , Pieter Wolfert (UGent) , Thomas Demeester (UGent) and Tony Belpaeme (UGent)
- Organization
- Project
- Abstract
- Visually situated language interaction is an important challenge in multi-modal Human-Robot Interaction (HRI). In this context we present a data-driven method to generate situated conversation starters based on visual context. We take visual data about the interactants and generate appropriate greetings for conversational agents in the context of HRI. For this, we constructed a novel open-source data set consisting of 4000 HRI-oriented images of people facing the camera, each augmented by three conversation-starting questions. We compared a baseline retrieval-based model and a generative model. Human evaluation of the models using crowdsourcing shows that the generative model scores best, specifically at correctly referencing visual features. We also investigated how automated metrics can be used as a proxy for human evaluation and found that common automated metrics are a poor substitute for human judgement. Finally, we provide a proof-of-concept demonstrator through an interaction with a Furhat social robot.
- Keywords
- natural language processing, human-robot interaction, conversational agent, multi-modal dialogue, natural language generation, situatedness, grounding
Downloads
-
DS506 acc.pdf
- full text (Accepted manuscript)
- |
- open access
- |
- |
- 2.06 MB
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 2.21 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8747513
- MLA
- Janssens, Ruben, et al. “’Cool Glasses, Where Did You Get Them?’ : Generating Visually Grounded Conversation Starters for Human-Robot Dialogue.” PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), IEEE Press, 2022, pp. 821–25, doi:10.1109/HRI53351.2022.9889489.
- APA
- Janssens, R., Wolfert, P., Demeester, T., & Belpaeme, T. (2022). ’Cool glasses, where did you get them?’ : generating visually grounded conversation starters for human-robot dialogue. PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), 821–825. https://doi.org/10.1109/HRI53351.2022.9889489
- Chicago author-date
- Janssens, Ruben, Pieter Wolfert, Thomas Demeester, and Tony Belpaeme. 2022. “’Cool Glasses, Where Did You Get Them?’ : Generating Visually Grounded Conversation Starters for Human-Robot Dialogue.” In PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), 821–25. IEEE Press. https://doi.org/10.1109/HRI53351.2022.9889489.
- Chicago author-date (all authors)
- Janssens, Ruben, Pieter Wolfert, Thomas Demeester, and Tony Belpaeme. 2022. “’Cool Glasses, Where Did You Get Them?’ : Generating Visually Grounded Conversation Starters for Human-Robot Dialogue.” In PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), 821–825. IEEE Press. doi:10.1109/HRI53351.2022.9889489.
- Vancouver
- 1.Janssens R, Wolfert P, Demeester T, Belpaeme T. ’Cool glasses, where did you get them?’ : generating visually grounded conversation starters for human-robot dialogue. In: PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22). IEEE Press; 2022. p. 821–5.
- IEEE
- [1]R. Janssens, P. Wolfert, T. Demeester, and T. Belpaeme, “’Cool glasses, where did you get them?’ : generating visually grounded conversation starters for human-robot dialogue,” in PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI ’22), Sapporo, Japan, 2022, pp. 821–825.
@inproceedings{8747513, abstract = {{Visually situated language interaction is an important challenge in multi-modal Human-Robot Interaction (HRI). In this context we present a data-driven method to generate situated conversation starters based on visual context. We take visual data about the interactants and generate appropriate greetings for conversational agents in the context of HRI. For this, we constructed a novel open-source data set consisting of 4000 HRI-oriented images of people facing the camera, each augmented by three conversation-starting questions. We compared a baseline retrieval-based model and a generative model. Human evaluation of the models using crowdsourcing shows that the generative model scores best, specifically at correctly referencing visual features. We also investigated how automated metrics can be used as a proxy for human evaluation and found that common automated metrics are a poor substitute for human judgement. Finally, we provide a proof-of-concept demonstrator through an interaction with a Furhat social robot.}}, author = {{Janssens, Ruben and Wolfert, Pieter and Demeester, Thomas and Belpaeme, Tony}}, booktitle = {{PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22)}}, isbn = {{9781665407311}}, issn = {{2167-2121}}, keywords = {{natural language processing,human-robot interaction,conversational agent,multi-modal dialogue,natural language generation,situatedness,grounding}}, language = {{eng}}, location = {{Sapporo, Japan}}, pages = {{821--825}}, publisher = {{IEEE Press}}, title = {{'Cool glasses, where did you get them?' : generating visually grounded conversation starters for human-robot dialogue}}, url = {{http://doi.org/10.1109/HRI53351.2022.9889489}}, year = {{2022}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: