Pergamos - Library and Information Center of National and Kapodistrian University of Athens

Unit:

Κατεύθυνση Μεγάλα Δεδομένα και Τεχνητή Νοημοσύνη
Πληροφορική

Deposit date:

2024-07-10

Year:

2024

Author:

Kouletou Eleni-Ioanna

Supervisors info:

Βασίλης Παπαβασιλείου, Συνεργαζόμενος Ερευνητής, ΙΕΛ/ΕΚ Αθηνά
Βασίλης Κατσούρος, Ερευνητής Α', ΙΕΛ/ΕΚ Αθηνά

Original Title:

Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding

Languages:

English
Greek

Translated title:

Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding

Summary:

Comic books, merging art with narrative, continue to captivate readers, cinema producers, and collectors, maintaining their allure as a cherished form of visual storytelling across decades. Comic image segmentation is a pivotal aspect of the digital transformation of comics. Leveraging heuristic approaches, neural network-based models (YOLO), and innovative transformer-based architectures (GroundingDINO, SAM), our research aims to autonomously segment comic pages into their fundamental components: panels, comic characters, and text areas. To this end, we further trained YOLOv5 and YOLOv8 models to identify these components, while transformer-based models employed prompts to retrieve them. By comparing their performance, in terms of established metrics (Precision, Recall, Average Precision), across three well-known datasets (eBDtheque, DCM772, Manga109) and using visual inspections, we conclude that pre-trained self-supervised transformer models can competently outperform state-of-the-art approaches, which often require further fine-tuning to achieve comparable results. Moreover, the character identification module has been examined using neural networks and unsupervised learning. Following the qualitative study, it was determined that this task is not universally applicable across various comic books. Instead, it should concentrate on the characters within a single comic book or volumes within the same series.

Main subject category:

Technology - Computer science

Keywords:

Comics, Object Detection, Object Segmentation, Panel Detection, Character Detection, Text Area Detection, Neural Networks, Transformers

Index:

Yes

Number of index pages:

Contains images:

Yes

Number of references:

Number of pages:

File:

File access is restricted until 2025-01-10.

Persistent URL:

https://pergamos.lib.uoa.gr/uoa/dl/object/3405329

MScThesisKouletou.pdf
11 MB
File access is restricted until 2025-01-10.

Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding

PDF file