Supervisors info:
Βασίλης Παπαβασιλείου, Συνεργαζόμενος Ερευνητής, ΙΕΛ/ΕΚ Αθηνά
Βασίλης Κατσούρος, Ερευνητής Α', ΙΕΛ/ΕΚ Αθηνά
Summary:
Comic books, merging art with narrative, continue to captivate readers, cinema producers, and collectors, maintaining their allure as a cherished form of visual storytelling across decades. Comic image segmentation is a pivotal aspect of the digital transformation of comics. Leveraging heuristic approaches, neural network-based models (YOLO), and innovative transformer-based architectures (GroundingDINO, SAM), our research aims to autonomously segment comic pages into their fundamental components: panels, comic characters, and text areas. To this end, we further trained YOLOv5 and YOLOv8 models to identify these components, while transformer-based models employed prompts to retrieve them. By comparing their performance, in terms of established metrics (Precision, Recall, Average Precision), across three well-known datasets (eBDtheque, DCM772, Manga109) and using visual inspections, we conclude that pre-trained self-supervised transformer models can competently outperform state-of-the-art approaches, which often require further fine-tuning to achieve comparable results. Moreover, the character identification module has been examined using neural networks and unsupervised learning. Following the qualitative study, it was determined that this task is not universally applicable across various comic books. Instead, it should concentrate on the characters within a single comic book or volumes within the same series.
Keywords:
Comics, Object Detection, Object Segmentation, Panel Detection, Character Detection, Text Area Detection, Neural Networks, Transformers