Randomized Embeddings with Slack and High-Dimensional Approximate Nearest Neighbor

Επιστημονική δημοσίευση - Άρθρο Περιοδικού uoadl:3180254 13 Αναγνώσεις

Μονάδα:
Ερευνητικό υλικό ΕΚΠΑ
Τίτλος:
Randomized Embeddings with Slack and High-Dimensional Approximate
Nearest Neighbor
Γλώσσες Τεκμηρίου:
Αγγλικά
Περίληψη:
Approximate nearest neighbor search (epsilon-ANN) in high dimensions has
been mainly addressed by Locality Sensitive Hashing (LSH), which has
complexity with polynomial dependence in dimension, sublinear query
time, but subquadratic space requirement. We introduce a new
“low-quality” embedding for metric spaces requiring that, for some
query, there exists an approximate nearest neighbor among the pre-images
of its k > 1 approximate nearest neighbors in the target space. In
Euclidean spaces, we employ random projections to a dimension inversely
proportional to k.
Our approach extends to the decision problem with witness of checking
whether there exists an approximate near neighbor; this also implies a
solution for epsilon-ANN. After dimension reduction, we store points in
a uniform grid of side length epsilon/root d’, where d’ is the reduced
dimension. Given a query, we explore cells intersecting the unit ball
around the query. This data structure requires linear space and query
time in O(dn(rho)), rho approximate to 1 - epsilon(2)/ log(1/epsilon);
where n denotes input cardinality and d space dimension. Bounds are
improved for doubling subsets via r-nets.
We present our implementation for epsilon-ANN in C++ and experiments for
d <= 960, n <= 10(6), using synthetic and real datasets, which confirm
the theoretical analysis and, typically, yield better practical
performance. We compare to FALCONN, the state-of-the-art implementation
of rnulti-probe LSH: our prototype software is essentially comparable in
terms of preprocessing, query time, and storage usage.
Έτος δημοσίευσης:
2018
Συγγραφείς:
Anagnostopoulos, Evangelos
Emiris, Ioannis Z.
Psarros, Ioannis
Περιοδικό:
ACM Transactions on Algorithms (TALG)
Εκδότης:
ASSOCIATION FOR COMPUTING MACHINERY
Τόμος:
14
Αριθμός / τεύχος:
2
Λέξεις-κλειδιά:
Approximate nearest neighbor; curse of dimensionality;
Johnson-Lindenstrauss Lemma; randomized embeddings; doubling dimension;
experimental study
Επίσημο URL (Εκδότης):
DOI:
10.1115/3178590
Το ψηφιακό υλικό του τεκμηρίου δεν είναι διαθέσιμο.