This is a R-Java implementation of iBART found in Ye, Senftle, & Li Operator-induced structural variable selection for identifying materials genes. This R package largely depends on the R package bartMachine
for its BART-G.SE variable selection implementation.
Before installing the iBART package in R, you first need to install Java JDK and rJava R package.
Download Java 17 JDK or above and install it properly. Then run R CMD javareconf
from the command line to configure Java in R. iBART requires bartMachine and rJava which require Java JDK; Java JRE won’t work!
Run install.packages("rJava", INSTALL_opts = "--no-multriarch")
within R. To reproduce results in the paper, please install rJava 1.0-4
.
Run install.packages("bartMachine", INSTALL_opts = "--no-multiarch")
within R. To reproduce results in the paper, please install bartMachineJARs 1.1
and bartMachine 1.2.6
. If you experience error, please see the bartMachine repo for detailed instructions.
Run install.packages("glmnet")
within R. To reproduce results in the paper, please install glmnet 4.1-1
.
Run devtools::install_github("mattsheng/iBART", INSTALL_opts = "--no-multriarch")
within R or run devtools::install_github("mattsheng/iBART", INSTALL_opts = "--no-multriarch", build_vignettes = TRUE)
if you want to build the vignettes; this will take a while.
We use the simulation model in Section 3.4 of our paper to demonstrate the usage of iBART. Vignettes for real data application and simulation are available at here
set.seed(123)
options(java.parameters = "-Xmx10g") # Allocate 10GB of memory for Java
library(iBART)
n <- 250
p <- 10
X <- matrix(runif(n * p, min = -1, max = 1), nrow = n, ncol = p)
colnames(X) <- paste("x.", seq(from = 1, to = p, by = 1), sep = "")
y <- 15*(exp(X[,1])-exp(X[,2]))^2 + 20*sin(pi*X[,3]*X[,4])
+ rnorm(n, mean = 0, sd = 0.5)
iBART_results <- iBART(X = X, y = y,
head = colnames(X),
unit = NULL, # no unit information for simulation
opt = c("unary", "binary", "unary"), # unary operator first
sin_cos = TRUE, # add sin and cos to operator set
apply_pos_opt_on_neg_x = FALSE, # e.g. do not apply log() on negative x
Lzero = TRUE, # best subset selection
K = 4, # at most 4 predictors in best subset model
standardize = FALSE, # don't standardize input matrix X
seed = 99)
# > Start iBART descriptor generation and selection...
# > Iteration 1
# > iBART descriptor selection...
# > avg..........null..................................................
# > Constructing descriptors using unary operators...
# > Iteration 2
# > iBART descriptor selection...
# > avg..........null..................................................
# > Constructing descriptors using binary operators...
# > Iteration 3
# > iBART descriptor selection...
# > avg..........null..................................................
# > Constructing descriptors using unary operators...
# > BART iteration done!
# > LASSO descriptor selection...
# > L-zero regression...
# > Total time: 261.336249113083 secs
# Correct descriptor names are (exp(x.1)-exp(x.2))^2 and sin(pi*x.3*x.4)
iBART_results$descriptor_names
# > [1] "(exp(x.1)-exp(x.2))^2" "sin(pi*(x.3*x.4))"
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_4.1-1 Matrix_1.3-4 bartMachine_1.2.6
[4] missForest_1.4 itertools_0.1-3 iterators_1.0.13
[7] foreach_1.5.1 randomForest_4.6-14 bartMachineJARs_1.1
[10] rJava_1.0-4
loaded via a namespace (and not attached):
[1] lattice_0.20-44 codetools_0.2-18 grid_4.0.5 splines_4.0.5
[5] tools_4.0.5 survival_3.2-11 parallel_4.0.5 compiler_4.0.5
[9] shape_1.4.6