logisticPCA
is an R package for dimensionality reduction
of binary data. Please note that it is still in the very early stages of
development and the conventions will possibly change in the future. A
manuscript describing logistic PCA can be found here.
To install R, visit r-project.org/.
The package can be installed by downloading from CRAN.
install.packages("logisticPCA")
To install the development version, first install
devtools
from CRAN. Then run the following commands.
# install.packages("devtools")
library("devtools")
install_github("andland/logisticPCA")
Three types of dimensionality reduction are given. For all the
functions, the user must supply the desired dimension k
.
The data must be an n x d
matrix comprised of binary
variables (i.e. all 0
’s and 1
’s).
logisticPCA()
estimates the natural parameters of a
Bernoulli distribution in a lower dimensional space. This is done by
projecting the natural parameters from the saturated model. A
rank-k
projection matrix, or equivalently a
d x k
orthogonal matrix U
, is solved for to
minimize the Bernoulli deviance. Since the natural parameters from the
saturated model are either negative or positive infinity, an additional
tuning parameter m
is needed to approximate them. You can
use cv.lpca()
to select m
by cross validation.
Typical values are in the range of 3
to
10
.
mu
is a main effects vector of length d
and
U
is the d x k
loadings matrix.
logisticSVD()
estimates the natural parameters by a
matrix factorization. mu
is a main effects vector of length
d
, B
is the d x k
loadings
matrix, and A
is the n x k
principal component
score matrix.
convexLogisticPCA()
relaxes the problem of solving for a
projection matrix to solving for a matrix in the
k
-dimensional Fantope, which is the convex hull of
rank-k
projection matrices. This has the advantage that the
global minimum can be obtained efficiently. The disadvantage is that the
k
-dimensional Fantope solution may have a rank much larger
than k
, which reduces interpretability. It is also
necessary to specify m
in this function.
mu
is a main effects vector of length d
,
H
is the d x d
Fantope matrix, and
U
is the d x k
loadings matrix, which are the
first k
eigenvectors of H
.
Each of the classes has associated methods to make data analysis easier.
print()
: Prints a summary of the fitted model.fitted()
: Fits the low dimensional matrix of either
natural parameters or probabilities.predict()
: Predicts the PCs on new data. Can also
predict the low dimensional matrix of natural parameters or
probabilities on new data.plot()
: Either plots the deviance trace, the first two
PC loadings, or the first two PC scores using the package
ggplot2
.In addition, there are functions for performing cross validation.
cv.lpca()
, cv.lsvd()
,
cv.clpca()
: Run cross validation over the rows of the
matrix to assess the fit of m
and/or k
.plot.cv()
: Plots the results of the cv()
method.