Postgraduate Thesis uoadl:1708336 365 Read counter

Κατεύθυνση Λογική και Θεωρία Αλγορίθμων και Υπολογισμού

Library of the School of Science

Library of the School of Science

2017-07-06

2017

Avarikioti Georgia

Ιωάννης Ζ. Εμίρης, Καθηγητής, Τμήμα Πληροφορικής & Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών

Geometric Proximity Problems in High Dimensions

English

Geometric Proximity Problems in High Dimensions

Geometric proximity problems is a class of problems in computational geometry that involve estimation of distances between geometric objects.

In this work, we focus on two specific problems of this class, the computation of r-nets and the near neighbor decision problem on high dimensional spaces under the Euclidean distance, both of which are powerful tools in computational and metric geometry.

Specifically, we present a new randomized algorithm which efficiently computes high dimensional approximate r-nets with respect to Euclidean distance. For any fixed ε>0, the approximation factor is 1+ε and the complexity is polynomial in the dimension and subquadratic in the number of points; the algorithm succeeds with high probability. We improve upon the best previously known (LSH-based) construction of Eppstein et al. in terms of complexity, by reducing the dependence on ε, provided that εis sufficiently small. Moreover, our method does not require LSH but follows Valiant's approach in designing a sequence of reductions of our problem to other problems in different spaces, under Euclidean distance or inner product, for which r-nets are computed efficiently and the error can be controlled. Our result immediately implies efficient solutions to a number of geometric problems in high dimension, such as finding the (1+ε)-approximate k-th nearest neighbor distance in time subquadratic in the size of the input.

Additionally, we propose a new and simple data structure for the c-approximate near neighbor decision problem in high-dimensional spaces using linear space and sublinear query time for any c>1: given an LSH family of functions for some metric space, we randomly project points to vertices of the Hamming cube in dimension ≤log n, where n is the number of input points. The projected space contains strings which serve as keys for buckets containing the input points. The query algorithm simply projects the query point, then examines points which are assigned to the same or nearby vertices on the Hamming cube. We analyze in detail the query time for some standard LSH families.

In this work, we focus on two specific problems of this class, the computation of r-nets and the near neighbor decision problem on high dimensional spaces under the Euclidean distance, both of which are powerful tools in computational and metric geometry.

Specifically, we present a new randomized algorithm which efficiently computes high dimensional approximate r-nets with respect to Euclidean distance. For any fixed ε>0, the approximation factor is 1+ε and the complexity is polynomial in the dimension and subquadratic in the number of points; the algorithm succeeds with high probability. We improve upon the best previously known (LSH-based) construction of Eppstein et al. in terms of complexity, by reducing the dependence on ε, provided that εis sufficiently small. Moreover, our method does not require LSH but follows Valiant's approach in designing a sequence of reductions of our problem to other problems in different spaces, under Euclidean distance or inner product, for which r-nets are computed efficiently and the error can be controlled. Our result immediately implies efficient solutions to a number of geometric problems in high dimension, such as finding the (1+ε)-approximate k-th nearest neighbor distance in time subquadratic in the size of the input.

Additionally, we propose a new and simple data structure for the c-approximate near neighbor decision problem in high-dimensional spaces using linear space and sublinear query time for any c>1: given an LSH family of functions for some metric space, we randomly project points to vertices of the Hamming cube in dimension ≤log n, where n is the number of input points. The projected space contains strings which serve as keys for buckets containing the input points. The query algorithm simply projects the query point, then examines points which are assigned to the same or nearby vertices on the Hamming cube. We analyze in detail the query time for some standard LSH families.

Science

Algorithms and Theory of Computation

computational geometry, metric geometry, general dimension, approximation algorithm, r-nets, Locality Sensitive Hashing, Near Neighbor, high dimension, linear storage, sublinear query, random projection

Yes

2

Yes

34

68