Advanced search
2 files | 2.79 MB Add to list

I was blind but now I see : implementing vision-enabled dialogue in social robots

Giulio Antonio Abbo (UGent) and Tony Belpaeme (UGent)
Author
Organization
Project
Abstract
In the rapidly evolving landscape of human-robot interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents a ready-to-use implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4o mini) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, en-sures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. The system can be customised and is available as a stand-alone application, a Furhat robot implementation, and a ROS2 package.
Keywords
Large Language Model, Vision Language Model, Dialogue, HRI, Conversation, Prompt Engineering, ROS

Downloads

  • DS917 acc.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 999.58 KB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.79 MB

Citation

Please use this url to cite or link to this publication:

MLA
Abbo, Giulio Antonio, and Tony Belpaeme. “I Was Blind but Now I See : Implementing Vision-Enabled Dialogue in Social Robots.” 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, IEEE, 2025, pp. 1176–80, doi:10.1109/hri61500.2025.10973830.
APA
Abbo, G. A., & Belpaeme, T. (2025). I was blind but now I see : implementing vision-enabled dialogue in social robots. 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, 1176–1180. https://doi.org/10.1109/hri61500.2025.10973830
Chicago author-date
Abbo, Giulio Antonio, and Tony Belpaeme. 2025. “I Was Blind but Now I See : Implementing Vision-Enabled Dialogue in Social Robots.” In 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, 1176–80. IEEE. https://doi.org/10.1109/hri61500.2025.10973830.
Chicago author-date (all authors)
Abbo, Giulio Antonio, and Tony Belpaeme. 2025. “I Was Blind but Now I See : Implementing Vision-Enabled Dialogue in Social Robots.” In 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, 1176–1180. IEEE. doi:10.1109/hri61500.2025.10973830.
Vancouver
1.
Abbo GA, Belpaeme T. I was blind but now I see : implementing vision-enabled dialogue in social robots. In: 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI. IEEE; 2025. p. 1176–80.
IEEE
[1]
G. A. Abbo and T. Belpaeme, “I was blind but now I see : implementing vision-enabled dialogue in social robots,” in 2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI, Melbourne, Australia, 2025, pp. 1176–1180.
@inproceedings{01JY6D484F5MA1X6W21GCXAVCB,
  abstract     = {{In the rapidly evolving landscape of human-robot interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents a ready-to-use implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4o mini) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, en-sures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. The system can be customised and is available as a stand-alone application, a Furhat robot implementation, and a ROS2 package.}},
  author       = {{Abbo, Giulio Antonio and Belpaeme, Tony}},
  booktitle    = {{2025 20TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI}},
  isbn         = {{9798350378948}},
  issn         = {{2167-2121}},
  keywords     = {{Large Language Model,Vision Language Model,Dialogue,HRI,Conversation,Prompt Engineering,ROS}},
  language     = {{eng}},
  location     = {{Melbourne, Australia}},
  pages        = {{1176--1180}},
  publisher    = {{IEEE}},
  title        = {{I was blind but now I see : implementing vision-enabled dialogue in social robots}},
  url          = {{http://doi.org/10.1109/hri61500.2025.10973830}},
  year         = {{2025}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: