Metric Learning: A Deep Dive

Postgraduate Thesis uoadl:2925860 404 Read counter

Unit:
Κατεύθυνση Μεγάλα Δεδομένα και Τεχνητή Νοημοσύνη
Πληροφορική
Deposit date:
2020-10-26
Year:
2020
Author:
Psomas Vasileios
Supervisors info:
Yannis Avrithis, Research Scientist, INRIA Rennes-Bretagne Atlantique
Ioannis Emiris, Professor, National and Kapodistrian University of Athens
Vasileios Katsouros, Research Director, Athena Research and Innovation Center
Original Title:
Metric Learning: A Deep Dive
Languages:
English
Greek
Translated title:
Metric Learning: A Deep Dive
Summary:
Metric Learning is an important task in Machine Learning. The objective of Metric Learning is to learn a distance metric that reduces the distance between similar objects and
increases the distance between dissimilar ones. Similarity and dissimilarity can be somehow subjective and thus some kind of supervision is needed in order to define the groundtruth. Learning such a distance metric can be proven to be really useful for many tasks,
such as classification, retrieval and clustering. The classification and retrieval tasks can
be simply reduced to class-level and instance-level nearest neighbor tasks respectively,
while the clustering task can be made easier given the similarity matrix.
Traditionally, before Deep Learning, Metric Learning approaches were based either on
linear transformations using the Mahalanobis or/and Euclidean distance, or on non-linear
transformations using kernel-based methods. Both of them, however, had drawbacks.
Linear transformations had a limited ability to capture nonlinear feature structure and thus
could not achieve high performance over the new representation of the data. Non-linear
transformations that carried the problem to a non-linear space could achieve optimum performance, but often suffered from overfitting. Apart from that, both methods were limited
by their ability to process raw data and thus feature engineering was often needed.
With the remarkable success of Convolutional Neural Networks, Deep Metric Learning
was introduced. In this context, Neural Networks are discriminatively trained to learn the
non-linear mapping from input raw data to a lower dimensional and semantic embedding.
This is usually done in a supervised way, in which the label annotations are given and
thus these embeddings are optimized to pull samples with the same class label closer
and push samples with different class labels apart. The whole training process is done by
minimizing a loss function that should have exactly these properties. The great advantage
of Deep Metric Learning is that it jointly extracts the features and learns the embedding.
The contribution of this work is threefold. First, we conduct extensive experiments using
the most commonly used architectures (GoogLeNet, BNInception, ResNet50) on the most
commonly used datasets (CUB200-2011, CARS196, Stanford Online Products) using 10
different loss functions (Contrastive, Triplet, LiftedStructure, NPair, ProxyNCA, ArcFace,
Margin, MultiSimilarity, SoftTriple, ProxyAnchor) and four different embedding sizes (64,
128, 512, 1024). We make an ablation study and draw important conclusions using the
results. Second, we introduce and propose a new setup for training using a fixed validation
set. We conduct experiments using this and a 10-fold cross validation. Our setup seems
to balance perfectly between the computational complexity and retrieval quality trade-off.
Finally, we design, implement and experiment with a new loss function that is on a par
with the state-of-the-art.
Main subject category:
Technology - Computer science
Keywords:
Neural Networks, Deep Learning, Computer Vision, Metric Learning
Index:
Yes
Number of index pages:
3
Contains images:
Yes
Number of references:
57
Number of pages:
73
Metric_Learning_A_Deep_Dive_Psomas.pdf (7 MB) Open in new window