folda folda website

CRAN status R-CMD-check CRAN Downloads

The folda package is an R modeling tool designed for fitting Forward Stepwise Linear Discriminant Analysis (LDA) and Uncorrelated Linear Discriminant Analysis (ULDA). If you’re unfamiliar with stepwise LDA or ULDA, please refer to the following resources:

Installation

install.packages("folda")

You can install the development version of folda from GitHub with:

# install.packages("devtools")
devtools::install_github("Moran79/folda")

Overview

If you’ve ever been frustrated by the warnings and errors from MASS::lda(), you will appreciate the ULDA implementation in folda(). It offers several key improvements:

For the forward LDA implementation, folda offers the following advantages over the classical framework:

Basic Usage

library(folda)
mpg <- as.data.frame(ggplot2::mpg) # Prepare the data
datX <- mpg[, -5] # All predictors without Y
response <- mpg[, 5] # we try to predict "cyl" (number of cylinders)

Build a ULDA model with all variables:

fit <- folda(datX = datX, response = response, subsetMethod = "all")

Build a ULDA model with forward selection via Pillai’s trace:

fit <- folda(datX = datX, response = response, subsetMethod = "forward", testStat = "Pillai")
print(fit) # 6 out of 11 variables are selected, displ is the most important among them
#> 
#> Overall Pillai's trace: 1.325
#> Associated p-value: 4.636e-74
#> 
#> Prediction Results on Training Data:
#> Refitting Accuracy: 0.9188
#> Gini Index: 0.7004
#> 
#> Confusion Matrix:
#>          Actual
#> Predicted  4  5  6  8
#>         4 69  0  3  0
#>         5  8  4  2  0
#>         6  4  0 74  2
#>         8  0  0  0 68
#> 
#> Group means of LD scores:
#>           LD1         LD2        LD3
#> 4  3.05298379  0.02700248 -0.3555829
#> 5  1.87744449 -4.45014946  0.8156167
#> 6  0.06757888  0.28356907  0.5911862
#> 8 -3.71628852 -0.09697943 -0.3023424
#> 
#> Forward Selection Results:
#>                var statOverall   statDiff  threshold
#> 1            displ    0.873393 0.87339300 0.06545381
#> 2  modelnew beetle    1.029931 0.15653777 0.05673510
#> 3       modeljetta    1.141651 0.11172064 0.05496185
#> 4 modelcaravan 2wd    1.210165 0.06851331 0.05363507
#> 5     classmidsize    1.263449 0.05328468 0.05276500
#> 6              cty    1.325255 0.06180560 0.05194279

Plot the results:

plot(fit, datX = datX, response = response)

One-dimensional plot:

# A 1D plot is created when there is only one feature 
# or for binary classification problems.
mpgSmall <- mpg[, c("cyl", "displ")]
fitSmall <- folda(mpgSmall[, -1, drop = FALSE], mpgSmall[, 1])
plot(fitSmall, mpgSmall, mpgSmall[, 1])

Make predictions:

head(predict(fit, datX, type = "response"))
#> [1] "4" "4" "4" "4" "6" "4"
head(predict(fit, datX, type = "prob")) # Posterior probabilities
#>           4            5            6            8
#> 1 0.9966769 7.475058e-08 0.0033230408 7.023764e-12
#> 2 0.9994438 1.401133e-08 0.0005562131 5.338710e-13
#> 3 0.9970911 3.835722e-08 0.0029088506 1.738154e-11
#> 4 0.9983963 2.196016e-08 0.0016037009 7.365641e-12
#> 5 0.3122116 6.809673e-07 0.6877815595 6.173116e-06
#> 6 0.5995781 4.275271e-07 0.4004193019 2.123291e-06

More examples can be found in the vignette.

References

Getting help

If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub