Advanced search
1 file | 217.44 KB Add to list

A deep learning approach for analyzing visual and textual content in Tintin comics

Lukas De Kerpel (UGent) , Toon Van Camp (UGent) , Azat Şaşkal and Dries Benoit (UGent)
Author
Organization
Abstract
Comics analysis investigates the interplay of visual and textual elements to uncover narrative structures and storytelling mechanisms. Traditional approaches in comic art studies predominantly rely on manual methods to summarize content through these multimodal cues (Sugishita & Masuda,2023). However, manual labeling and panel analysis are both time-intensive and laborious, underscoring the need for automated solutions to enable large-scale and efficient analysis. The field of Comic Recognition (CR) has advanced automation in comics analysis by leveraging deep learning techniques to detect panels, characters, speech balloons, and text (Lenadora et al., 2019; Li et al., 2024; Nguyen et al.,2018). Nevertheless, current CR approaches overlook a crucial aspect: they are not explicitly trained to identify the main characters within a comic book series, instead focusing on detecting whether an object is a character in general. Moreover, publicly available datasets in CR literature are typically derived from a mix of comic series and lack annotations specific to individual characters. These limitations hinder the ability to conduct systematic analyses of comics and their underlying narratives. This study proposes a novel framework for automated extraction and analysis of narrative elements across an entire comic book series, applied to The Adventures of Tintin. The contributions of this research are threefold: (1) comprehensive data collection and segmentation of panel images across the entire series, (2) the development of a deep learning pipeline for detecting main characters, speech balloons, speaker-balloon associations, and textual content, and (3) the use of extracted data in a character network analysis to investigate social and emotional dynamics throughout the series. The methodology combines edge detection for panel extraction with YOLOv11 models for character and balloon detection. Furthermore, we developed an algorithm to associate speech balloons with the corresponding characters by detecting the tail point, identified as the contour point with the smallest angle. The algorithm then verifies whether the extended lines of this angle intersect the character's bounding box. Transformer-based OCR is employed for text recognition within the balloons. This pipeline generates a structured dataset wherein each panel includes metadata on main characters, associated text, and their interactions. The results of a comparative evaluation against the ground truth show the effectiveness of this framework in accurately extracting visual and textual elements from the panels of the Tintin comics. The extracted dataset is used for a character network analysis, leveraging text recognition, speaker-balloon associations, and character identification to examine narrative dynamics, social structures, and emotional interactions acrossThe Adventures of Tintin. This interdisciplinary framework bridges the gap between machine learning and sequential art studies, offering new insights into the analysis of visual storytelling.
Keywords
comics recognition, comics analysis, deep learning, object detection

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 217.44 KB

Citation

Please use this url to cite or link to this publication:

MLA
De Kerpel, Lukas, et al. “A Deep Learning Approach for Analyzing Visual and Textual Content in Tintin Comics.” Joint ORBEL-NGB Conference, Booklet of Abstracts, Maastricht University and ORBEL, 2025, pp. 108–09.
APA
De Kerpel, L., Van Camp, T., Şaşkal, A., & Benoit, D. (2025). A deep learning approach for analyzing visual and textual content in Tintin comics. Joint ORBEL-NGB Conference, Booklet of Abstracts, 108–109. Maastricht University and ORBEL.
Chicago author-date
De Kerpel, Lukas, Toon Van Camp, Azat Şaşkal, and Dries Benoit. 2025. “A Deep Learning Approach for Analyzing Visual and Textual Content in Tintin Comics.” In Joint ORBEL-NGB Conference, Booklet of Abstracts, 108–9. Maastricht University and ORBEL.
Chicago author-date (all authors)
De Kerpel, Lukas, Toon Van Camp, Azat Şaşkal, and Dries Benoit. 2025. “A Deep Learning Approach for Analyzing Visual and Textual Content in Tintin Comics.” In Joint ORBEL-NGB Conference, Booklet of Abstracts, 108–109. Maastricht University and ORBEL.
Vancouver
1.
De Kerpel L, Van Camp T, Şaşkal A, Benoit D. A deep learning approach for analyzing visual and textual content in Tintin comics. In: Joint ORBEL-NGB Conference, Booklet of Abstracts. Maastricht University and ORBEL; 2025. p. 108–9.
IEEE
[1]
L. De Kerpel, T. Van Camp, A. Şaşkal, and D. Benoit, “A deep learning approach for analyzing visual and textual content in Tintin comics,” in Joint ORBEL-NGB Conference, Booklet of Abstracts, Maastricht, Netherlands, 2025, pp. 108–109.
@inproceedings{01JP2WWV6SS76BCVY541HC9K59,
  abstract     = {{Comics analysis investigates the interplay of visual and textual elements to uncover narrative structures and storytelling mechanisms. Traditional approaches in comic art studies predominantly rely on manual methods to summarize content through these multimodal cues (Sugishita & Masuda,2023). However, manual labeling and panel analysis are both time-intensive and laborious, underscoring the need for automated solutions to enable large-scale and efficient analysis. The field of Comic Recognition (CR) has advanced automation in comics analysis by leveraging deep learning techniques to detect panels, characters, speech balloons, and text (Lenadora et al., 2019; Li et al., 2024; Nguyen et al.,2018). Nevertheless, current CR approaches overlook a crucial aspect: they are not explicitly trained to identify the main characters within a comic book series, instead focusing on detecting whether an object is a character in general. Moreover, publicly available datasets in CR literature are typically derived from a mix of comic series and lack annotations specific to individual characters. These limitations hinder the ability to conduct systematic analyses of comics and their underlying narratives.

This study proposes a novel framework for automated extraction and analysis of narrative elements across an entire comic book series, applied to The Adventures of Tintin. The contributions of this research are threefold: (1) comprehensive data collection and segmentation of panel images across the entire series, (2) the development of a deep learning pipeline for detecting main characters, speech balloons, speaker-balloon associations, and textual content, and (3) the use of extracted data in a character network analysis to investigate social and emotional dynamics throughout the series. 

The methodology combines edge detection for panel extraction with YOLOv11 models for character and balloon detection. Furthermore, we developed an algorithm to associate speech balloons with the corresponding characters by detecting the tail point, identified as the contour point with the smallest angle. The algorithm then verifies whether the extended lines of this angle intersect the character's bounding box. Transformer-based OCR is employed for text recognition within the balloons. This pipeline generates a structured dataset wherein each panel includes metadata on main characters, associated text, and their interactions. The results of a comparative evaluation against the ground truth show the effectiveness of this framework in accurately extracting visual and textual elements from the panels of the Tintin comics. The extracted dataset is used for a character network analysis, leveraging text recognition, speaker-balloon associations, and character identification to examine narrative dynamics, social structures, and emotional interactions acrossThe Adventures of Tintin. This interdisciplinary framework bridges the gap between machine learning and sequential art studies, offering new insights into the analysis of visual storytelling.}},
  author       = {{De Kerpel, Lukas and Van Camp, Toon and Şaşkal, Azat and Benoit, Dries}},
  booktitle    = {{Joint ORBEL-NGB Conference, Booklet of Abstracts}},
  keywords     = {{comics recognition,comics analysis,deep learning,object detection}},
  language     = {{eng}},
  location     = {{Maastricht, Netherlands}},
  pages        = {{108--109}},
  publisher    = {{Maastricht University and ORBEL}},
  title        = {{A deep learning approach for analyzing visual and textual content in Tintin comics}},
  year         = {{2025}},
}