Pergamos - Library and Information Center of National and Kapodistrian University of Athens

Unit:

Κατεύθυνση Αλγόριθμοι, Λογική και Διακριτά Μαθηματικά (Α.Λ.ΜΑ.)
Πληροφορική

Deposit date:

2019-04-19

Year:

2019

Author:

Margonis Vasileios

Supervisors info:

Ιωάννης Εμίρης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών

Original Title:

Dimensionality reduction for approximate near neighbor search in the Manhattan metric

Languages:

English

Translated title:

Dimensionality reduction for approximate near neighbor search in the Manhattan metric

Summary:

The approximate nearest neighbor problem is one of the fundamental problems in computational geometry and has received much attention during the past decades. Efficient and practical algorithms are known for data sets of low dimension. However, modern, high-dimensional data cannot be handled by these algorithms, because of the so called "curse of dimensionality". A new theory for approximate nearest neighbors in high dimensions emerged with an influential paper by Indyk and Motwani, in 1998, yielding algorithms that depend polynomially on the dimension.

Nevertheless, is has been realized that designing efficient ANN data structures is closely related with dimension-reducing embeddings. One popular dimension reduction technique is randomized projections. Starting with the celebrated Johnson-Lindenstrauss Lemma, such projections have been studied in depth for the Euclidean (L2) metric and, much less, for the Manhattan (L1) metric. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both L2 and L1 metrics, as well as for doubling subsets of L2.

In this thesis, we consider the approximate nearest neighbor problem in doubling subsets of L1. We exploit the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead, and we propose a dimension reducing, near neighbor-preserving embedding for doubling subsets of L1. Our approach is to represent the point set with a carefully chosen covering set, and then apply a random linear projection to that covering set, using a matrix of Cauchy random variables. We study two cases of covering sets: approximate nets and randomly shifted grids, and we discuss the differences between them in terms of computing time, target dimension, as well as their algorithmic implications.

Main subject category:

Science

Keywords:

Approximate nearest neighbor, dimensionality reduction, Manhattan metric, randomized embedding

Index:

Yes

Number of index pages:

Contains images:

Yes

Number of references:

Number of pages:

File: