# Logistic PCA

`logisticPCA` is an R package for dimensionality reduction of binary data. Please note that it is still in the very early stages of development and the conventions will possibly change in the future. A manuscript describing logistic PCA can be found here. ## Installation

To install R, visit r-project.org/.

``install.packages("logisticPCA")``

To install the development version, first install `devtools` from CRAN. Then run the following commands.

``````# install.packages("devtools")
library("devtools")
install_github("andland/logisticPCA")``````

## Classes

Three types of dimensionality reduction are given. For all the functions, the user must supply the desired dimension `k`. The data must be an `n x d` matrix comprised of binary variables (i.e. all `0`'s and `1`'s).

### Logistic PCA

`logisticPCA()` estimates the natural parameters of a Bernoulli distribution in a lower dimensional space. This is done by projecting the natural parameters from the saturated model. A rank-`k` projection matrix, or equivalently a `d x k` orthogonal matrix `U`, is solved for to minimize the Bernoulli deviance. Since the natural parameters from the saturated model are either negative or positive infinity, an additional tuning parameter `m` is needed to approximate them. You can use `cv.lpca()` to select `m` by cross validation. Typical values are in the range of `3` to `10`.

`mu` is a main effects vector of length `d` and `U` is the `d x k` loadings matrix.

### Logistic SVD

`logisticSVD()` estimates the natural parameters by a matrix factorization. `mu` is a main effects vector of length `d`, `B` is the `d x k` loadings matrix, and `A` is the `n x k` principal component score matrix.

### Convex Logistic PCA

`convexLogisticPCA()` relaxes the problem of solving for a projection matrix to solving for a matrix in the `k`-dimensional Fantope, which is the convex hull of rank-`k` projection matrices. This has the advantage that the global minimum can be obtained efficiently. The disadvantage is that the `k`-dimensional Fantope solution may have a rank much larger than `k`, which reduces interpretability. It is also necessary to specify `m` in this function.

`mu` is a main effects vector of length `d`, `H` is the `d x d` Fantope matrix, and `U` is the `d x k` loadings matrix, which are the first `k` eigenvectors of `H`.

## Methods

Each of the classes has associated methods to make data analysis easier.

• `print()`: Prints a summary of the fitted model.
• `fitted()`: Fits the low dimensional matrix of either natural parameters or probabilities.
• `predict()`: Predicts the PCs on new data. Can also predict the low dimensional matrix of natural parameters or probabilities on new data.
• `plot()`: Either plots the deviance trace, the first two PC loadings, or the first two PC scores using the package `ggplot2`.

In addition, there are functions for performing cross validation.

• `cv.lpca()`, `cv.lsvd()`, `cv.clpca()`: Run cross validation over the rows of the matrix to assess the fit of `m` and/or `k`.
• `plot.cv()`: Plots the results of the `cv()` method.