Iterative label cleaning for semi-supervised learning

Postgraduate Thesis uoadl:2958160 163 Read counter

Unit:
Κατεύθυνση Ηλεκτρονικός Αυτοματισμός (H/A)
Library of the School of Science
Deposit date:
2021-07-22
Year:
2021
Author:
Bellos Filippos
Supervisors info:
Γιάννης Αβρίθης, Ερευνητής, Irisa, Inria Rennes-Bretagne Atlantique
Διονύσιος Ρεΐσης, Καθηγητής, Τμήμα Φυσικής, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Άννα Τζανακάκη, Αναπληρώτρια Καθηγήτρια, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Original Title:
Επαναληπτικός καθαρισμός προβλέψεων για ημι-επιβλεπόμενη μάθηση
Languages:
English
Greek
Translated title:
Iterative label cleaning for semi-supervised learning
Summary:
Deep neural networks have become the de facto model for computer vision applications. Their success is partially attributable to their scalability, i.e., the empirical observation that training them on larger datasets produces better performance. Deep networks often achieve their strong performance through supervised learning, which requires a labeled dataset. The performance benefit conferred by the use of a larger dataset can therefore come at a significant cost since labeling data often requires human labor. This cost can be particularly extreme when labeling must be done by an expert.

A powerful approach for training models on a large amount of data without requiring a large amount of labels is semi-supervised learning (SSL). SSL mitigates the requirement for labeled data by providing a means of leveraging unlabeled data. Since unlabeled data can often be obtained with minimal human labor, any performance boost conferred by SSL often comes with low cost. This has led to a plethora of SSL methods that are designed for deep networks.

In this thesis, we propose two methods that combine successful ideas in problems related to our task at hand. In particular, we propose CleanMatch and WeightMatch, two new semi-supervised learning methods that unify dominant approaches and address their limitations. CleanMatch consists of two stages: (1) iterative selection of the most confident pseudo-labels provided by a combination of consistency regularization and pseudo-labeling following FixMatch and (2) augmentation of the labeled set with the selected examples of the first stage and semi-supervised training based on FixMatch on the augmented dataset. WeightMatch estimates a weight reflecting the confidence of each labeled example, forcing the model to rely more on the confident ones during training.

Our methods improve the state-of-the-art by a large margin on CIFAR-10, SVHN and CIFAR-100, especially on few label settings.
Main subject category:
Science
Other subject categories:
Technology - Computer science
Keywords:
Semi-supervised learning, Noisy labels
Index:
Yes
Number of index pages:
73
Contains images:
Yes
Number of references:
117
Number of pages:
91
File:
File access is restricted only to the intranet of UoA.

Diploma_thesis_Bellos_Filippos.pdf
4 MB
File access is restricted only to the intranet of UoA.