I was blind but now I see : implementing vision-enabled dialogue in social robots
- Author
- Giulio Antonio Abbo (UGent) and Tony Belpaeme (UGent)
- Organization
- Project
- Abstract
- In the rapidly evolving landscape of human-robot interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents a ready-to-use implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4o mini) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, en-sures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. The system can be customised and is available as a stand-alone application, a Furhat robot implementation, and a ROS2 package.
- Keywords
- Large Language Model, Vision Language Model, Dialogue, HRI, Conversation, Prompt Engineering, ROS
Downloads
-
DS917 acc.pdf
- full text (Accepted manuscript)
- |
- open access
- |
- |
- 999.58 KB
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 1.79 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01JY6D484F5MA1X6W21GCXAVCB
- MLA
- Abbo, Giulio Antonio, and Tony Belpaeme. “I Was Blind but Now I See : Implementing Vision-Enabled Dialogue in Social Robots.” 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, IEEE, 2025, pp. 1176–80, doi:10.1109/hri61500.2025.10973830.
- APA
- Abbo, G. A., & Belpaeme, T. (2025). I was blind but now I see : implementing vision-enabled dialogue in social robots. 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, 1176–1180. https://doi.org/10.1109/hri61500.2025.10973830
- Chicago author-date
- Abbo, Giulio Antonio, and Tony Belpaeme. 2025. “I Was Blind but Now I See : Implementing Vision-Enabled Dialogue in Social Robots.” In 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, 1176–80. IEEE. https://doi.org/10.1109/hri61500.2025.10973830.
- Chicago author-date (all authors)
- Abbo, Giulio Antonio, and Tony Belpaeme. 2025. “I Was Blind but Now I See : Implementing Vision-Enabled Dialogue in Social Robots.” In 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, 1176–1180. IEEE. doi:10.1109/hri61500.2025.10973830.
- Vancouver
- 1.Abbo GA, Belpaeme T. I was blind but now I see : implementing vision-enabled dialogue in social robots. In: 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI. IEEE; 2025. p. 1176–80.
- IEEE
- [1]G. A. Abbo and T. Belpaeme, “I was blind but now I see : implementing vision-enabled dialogue in social robots,” in 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, Melbourne, Australia, 2025, pp. 1176–1180.
@inproceedings{01JY6D484F5MA1X6W21GCXAVCB,
abstract = {{In the rapidly evolving landscape of human-robot interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents a ready-to-use implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4o mini) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, en-sures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. The system can be customised and is available as a stand-alone application, a Furhat robot implementation, and a ROS2 package.}},
author = {{Abbo, Giulio Antonio and Belpaeme, Tony}},
booktitle = {{2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI}},
isbn = {{9798350378948}},
issn = {{2167-2121}},
keywords = {{Large Language Model,Vision Language Model,Dialogue,HRI,Conversation,Prompt Engineering,ROS}},
language = {{eng}},
location = {{Melbourne, Australia}},
pages = {{1176--1180}},
publisher = {{IEEE}},
title = {{I was blind but now I see : implementing vision-enabled dialogue in social robots}},
url = {{http://doi.org/10.1109/hri61500.2025.10973830}},
year = {{2025}},
}
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: