On visual explanation of supervised and self-supervised learning

Postgraduate Thesis uoadl:3256221 94 Read counter

Unit:
Κατεύθυνση Μεγάλα Δεδομένα και Τεχνητή Νοημοσύνη
Πληροφορική
Deposit date:
2022-12-28
Year:
2022
Author:
Reppas Dimitrios
Supervisors info:
Ιωάννης Αβρίθης, Διευθυντής Έρευνας, Ερευνητικό κέντρο ”ΑΘΗΝΑ”
Ιωάννης Εμίρης, Πρόεδρος και Γενικός Διευθυντής, Ερευνητικό κέντρο ”ΑΘΗΝΑ”
Stephane Ayache, Καθηγητής και Ερευνητής, Πανεπιστήμιο Μασσαλίας
Original Title:
On visual explanation of supervised and self-supervised learning
Languages:
English
Translated title:
On visual explanation of supervised and self-supervised learning
Summary:
In recent years, the rapid development of Deep Neural Networks (DNN) has led to a remarkable performance in many Computer Vision tasks. The increasing complexity of the models, the computational power, the amount available of data and the supervision during the training process are the main causes behind this success. As an alternative to supervised representation learning, self-supervised methods are becoming popular in dispensing the need for carefully labelled datasets.

Undoubtedly, the more complex the models get, the greater is the need for understanding their predictions. The primary objective of this thesis is to interpret both supervised and self-supervised models, using either convolutional neural networks or visual transformers as a backbone. Variations of visualization methods are used, based on class activation maps (CAM) and attention mechanisms. Given an input image, these methods provide us with a saliency map that is used to interpret the network prediction. This map indicates the regions of the image that the model pays the most attention to. We evaluate these methods qualitatively and quantitatively. We further propose new alternative or complementary visualization methods, which show where important information can be hidden inside the network and how to reveal it. These new methods further improve the quantitative results. Our study highlights the importance of interpretability, shows some common properties and differences in the way supervised and self-supervised models make their predictions and provides valuable information on both the models and the visualization methods.

Thanks to the knowledge we gain from the interpretability study, we further improve self-supervised learning, in particular using mask image modeling (MIM). Here, we indicate the regions of an image that are most important to be hidden from a student network and define a more challenging MIM-based self-supervision pre-text task. Based on this, we propose new masking strategies that achieve higher k-NN, linear probing scores and acceleration in the learning process of downstream tasks. Considering the computational efficiency challenge these methods face, we conduct experiments on different scales of a dataset and number of training epochs and show their impact on the scores. Here, we further visually explain the influence of each masking strategy and scale of a dataset by using interpretability methods during the learning and evaluation process. Finally, we introduce a new loss function based on contrastive learning and achieve improvements over the baseline when used with different masking strategies.
Main subject category:
Technology - Computer science
Keywords:
Computer Vision, Deep Learning, Interpretability, Self-supervised Learning
Index:
Yes
Number of index pages:
5
Contains images:
Yes
Number of references:
81
Number of pages:
79
Thesis_Dimitrios_Reppas.pdf (9 MB) Open in new window