Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding

Postgraduate Thesis uoadl:3405329 5 Read counter

Unit:
Κατεύθυνση Μεγάλα Δεδομένα και Τεχνητή Νοημοσύνη
Πληροφορική
Deposit date:
2024-07-10
Year:
2024
Author:
Kouletou Eleni-Ioanna
Supervisors info:
Βασίλης Παπαβασιλείου, Συνεργαζόμενος Ερευνητής, ΙΕΛ/ΕΚ Αθηνά
Βασίλης Κατσούρος, Ερευνητής Α', ΙΕΛ/ΕΚ Αθηνά
Original Title:
Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding
Languages:
English
Greek
Translated title:
Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding
Summary:
Comic books, merging art with narrative, continue to captivate readers, cinema producers, and collectors, maintaining their allure as a cherished form of visual storytelling across decades. Comic image segmentation is a pivotal aspect of the digital transformation of comics. Leveraging heuristic approaches, neural network-based models (YOLO), and innovative transformer-based architectures (GroundingDINO, SAM), our research aims to autonomously segment comic pages into their fundamental components: panels, comic characters, and text areas. To this end, we further trained YOLOv5 and YOLOv8 models to identify these components, while transformer-based models employed prompts to retrieve them. By comparing their performance, in terms of established metrics (Precision, Recall, Average Precision), across three well-known datasets (eBDtheque, DCM772, Manga109) and using visual inspections, we conclude that pre-trained self-supervised transformer models can competently outperform state-of-the-art approaches, which often require further fine-tuning to achieve comparable results. Moreover, the character identification module has been examined using neural networks and unsupervised learning. Following the qualitative study, it was determined that this task is not universally applicable across various comic books. Instead, it should concentrate on the characters within a single comic book or volumes within the same series.
Main subject category:
Technology - Computer science
Keywords:
Comics, Object Detection, Object Segmentation, Panel Detection, Character Detection, Text Area Detection, Neural Networks, Transformers
Index:
Yes
Number of index pages:
2
Contains images:
Yes
Number of references:
44
Number of pages:
65
File:
File access is restricted until 2025-01-10.

MScThesisKouletou.pdf
11 MB
File access is restricted until 2025-01-10.