WinnER: A Winner-Take-All Hashing-Based Unsupervised Model for Entity Resolution Problems

Graduate Thesis uoadl:2979645 210 Read counter

Unit:
Department of Informatics and Telecommunications
Πληροφορική
Deposit date:
2022-03-17
Year:
2022
Author:
NIKOLETOS KONSTANTINOS
Supervisors info:
ΑΛΕΞΙΟΣ ΔΕΛΗΣ, ΚΑΘΗΓΗΤΗΣ, ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΠΙΚΟΙΝΩΝΙΩΝ, ΕΚΠΑ
ΒΑΣΙΛΕΙΟΣ ΒΕΡΥΚΙΟΣ, ΚΑΘΗΓΗΤΗΣ, ΣΧΟΛΗ ΘΕΤΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΚΑΙ ΤΕΧΝΟΛΟΓΙΑΣ, ΕΛΛΗΝΙΚΟ ΑΝΟΙΚΤΟ ΠΑΝΕΠΙΣΤΗΜΙΟ
Original Title:
WinnER: A Winner-Take-All Hashing-Based Unsupervised Model for Entity Resolution Problems
Languages:
English
Greek
Translated title:
WinnER: A Winner-Take-All Hashing-Based Unsupervised Model for Entity Resolution Problems
Summary:
In this study, we propose an end-to-end unsupervised learning model that can be used for Entity Resolution problems on string data sets. An innovative prototype selection algorithm is utilized in order to create a rich euclidean, and at the same time, dissimilarity space. Part of this work, is a fine presentation of the theoretical benefits of a euclidean and dissimilarity space. Following we present an embedding scheme based on rank-ordered vectors, that circumvents the Curse of Dimensionality problem. The core of our framework is a locality hashing algorithm named Winner-Take-All, which accelerates our models run time while also maintaining great scores in the similarity checking phase. For the similarity checking phase, we adopt Kendall Tau rank correlation coefficient, a metric for comparing rankings. Finally, we use two state-of-the-art frameworks in order to make a consistent evaluation of our methodology among a famous Entity Resolution data set.
Main subject category:
Technology - Computer science
Keywords:
unsupervised-learning, clustering, entity-resolution, wta-hashing, prototype-selection
Index:
Yes
Number of index pages:
5
Contains images:
Yes
Number of references:
31
Number of pages:
57
sdi1700104_BSTHESIS.pdf (2 MB) Open in new window