A package for precise approximative nearest neighbor search in more than just euclidean space.
Its only exported function find_knn
computes the
k
nearest neighbors of the rows of the query
matrix in the data
matrix. If no query
matrix
is passed, the nearest neighbors for all rows in the data will be
returned (i.e. data
will be used as
query
).
find_knn(
data, k, ...,query = NULL,
distance = c("euclidean", "cosine", "rankcor"),
sym = TRUE)
The result will be a list containing
index
, a nrow(query)
× k
integer matrix containing the row indices into data
that
are the nearest neighbors.
dist
, a nrow(query)
× k
double matrix containing the distance
s to those
neighbors.
dist_mat
, a nrow(query)
×
nrow(data)
a Matrix::dSparseMatrix
, generic
if !sym
or !is.null(query)
, and symmetric
if sym
and is.null(query)
. Zeros in this
matrix mean “not a knn”, and if sym
is set, the matrix will
be post processed to be symmetric.
(Without post processing, the matrix will likely be asymmetric as
r1∈kNN(r2)
does not imply r2∈knn(r1)
)
This package was separated from destiny as it might prove helpful in other contexts. It provides more distance metrics than FNN and is more precise than RcppHNSW, but slower than both.
If anyone knows a faster and similarly precise kNN search in cosine (=rank correlation) space, please tell me!