Περίληψη:
Approximate nearest neighbor search (epsilon-ANN) in high dimensions has
been mainly addressed by Locality Sensitive Hashing (LSH), which has
complexity with polynomial dependence in dimension, sublinear query
time, but subquadratic space requirement. We introduce a new
“low-quality” embedding for metric spaces requiring that, for some
query, there exists an approximate nearest neighbor among the pre-images
of its k > 1 approximate nearest neighbors in the target space. In
Euclidean spaces, we employ random projections to a dimension inversely
proportional to k.
Our approach extends to the decision problem with witness of checking
whether there exists an approximate near neighbor; this also implies a
solution for epsilon-ANN. After dimension reduction, we store points in
a uniform grid of side length epsilon/root d’, where d’ is the reduced
dimension. Given a query, we explore cells intersecting the unit ball
around the query. This data structure requires linear space and query
time in O(dn(rho)), rho approximate to 1 - epsilon(2)/ log(1/epsilon);
where n denotes input cardinality and d space dimension. Bounds are
improved for doubling subsets via r-nets.
We present our implementation for epsilon-ANN in C++ and experiments for
d <= 960, n <= 10(6), using synthetic and real datasets, which confirm
the theoretical analysis and, typically, yield better practical
performance. We compare to FALCONN, the state-of-the-art implementation
of rnulti-probe LSH: our prototype software is essentially comparable in
terms of preprocessing, query time, and storage usage.
Συγγραφείς:
Anagnostopoulos, Evangelos
Emiris, Ioannis Z.
Psarros, Ioannis