Implementation of Frequent-Directions algorithm for efficient matrix sketching [E. Liberty, SIGKDD2013]
# Not yet onCRAN
install.packages("frequentdirections")
# Or the development version from GitHub:
install.packages("devtools")
::install_github("shinichi-takayanagi/frequentdirections") devtools
Here, we use Handwritten
digits USPS dataset as sample data. In the following example, we
assume that you save the above sample data into /tmp
directory.
The dataset has 7291 train and 2007 test images in h5
format. The images are 16*16 grayscale pixels.
library("h5")
<- h5file("/tmp/usps.h5")
file <- file["train/data"][]
x <- file["train/target"][]
y str(x)
#> num [1:7291, 1:256] 0 0 0 0 0 0 0 0 0 0 ...
Example the number 8
image(matrix(x[338,], nrow=16, byrow = FALSE))
Plot the original data on the first and second singular vector plane.
<- scale(x)
x ::plot_svd(x, y) frequentdirections
<- 10^(-8)
eps # 7291 x 256 -> 8 * 256 matrix
<- frequentdirections::sketching(x, 8, eps)
b ::plot_svd(x, y, b) frequentdirections
# 7291 x 256 -> 32 * 256 matrix
<- frequentdirections::sketching(x, 32, eps)
b ::plot_svd(x, y, b) frequentdirections
# 7291 x 256 -> 128 * 256 matrix
<- frequentdirections::sketching(x, 128, eps)
b ::plot_svd(x, y, b) frequentdirections
This result is almost the same with the original data SVD expression.
That’s why we can think that the original data is expressed with only
128
rows.