xtune
This vignette is a tutorial on how to use the xtune
package to fit feature-specific regularized regression models based on external information.
In this tutorial the following points are going to be viewed:
xtune
modelxtune
OverviewThe main usage of xtune
is to tune multiple shrinkage parameters in regularized regressions (Lasso, Ridge, and Elastic-net), based on external information.
The classical penalized regression uses a single penalty parameter \(\lambda\) that applies equally to all regression coefficients to control the amount of regularization in the model. And the single penalty parameter tuning is typically performed using cross-validation.
Here we apply an individual shrinkage parameter \(\lambda_j\) to each regression coefficient \(\beta_j\). And the vector of shrinkage parameters \(\lambda s = (\lambda_1,...,\lambda_p)\) is guided by external information \(Z\). In specific, \(\lambda\)s is modeled as a log-linear function of \(Z\). Better prediction accuracy for penalized regression models may be achieved by allowing individual shrinkage for each regression coefficients based on external information.
To tune the differential shrinkage parameter vector \(\lambda s = (\lambda_1,...,\lambda_p)\), we employ an Empirical Bayes approach by specifying Elastic-net to their random-effect formulation. Once the tuning parameters \(\lambda\)s are estimated, and therefore the penalties known, the regression coefficients are obtained using the glmnet
package.
The response variable can be either quantitative or categorical. Utilities for carrying out post-fitting summary and prediction are also provided.
Here, we four simulated examples to illustrate the usage and syntax of xtune
. The first example gives users a general sense of the data structure and model fitting process. The second and third examples use simulated data in concrete scenarios to illustrate the usage of the package. In the second example diet
, we provide simulated data to mimic the dietary example described in this paper:
S. Witte, John & Greenland, Sander & W. Haile, Robert & L. Bird, Cristy. (1994). Hierarchical Regression Analysis Applied to a Study of Multiple Dietary Exposures and Breast Cancer. Epidemiology (Cambridge, Mass.). 5. 612-21. 10.1097/00001648-199411000-00009.
In the third example gene
, we provide simulated data to mimic the bone density data published in the European Bioinformatics Institute (EMBL-EBI) ArrayExpress repository, ID: E-MEXP-1618.
And in the fourth example, we simulated data with multicategorical outcomes with three levels to provide the multi-classification example using xtune
.
In the first example, \(Y\) is a \(n = 100\)-dimensional continuous observed outcome vector, \(X\) is matrix of \(p\) potential predictors observed on the \(n\) observations, and \(Z\) is a set of \(q = 4\) external features available for the \(p = 300\) predictors.
library(xtune)
data("example")
X <- example$X; Y <- example$Y; Z <- example$Z
dim(X);dim(Z)
#> [1] 100 300
#> [1] 300 4
Each column of Z contains information about the predictors in design matrix X. The number of rows in Z equals to the number of predictors in X.
X[1:3,1:10]
#> Predictor_1 Predictor_2 Predictor_3 Predictor_4 Predictor_5
#> Observation_1 -0.7667960 0.9212806 2.0149030 0.79004563 -1.4244699
#> Observation_2 -0.8164583 -0.3144157 -0.2253684 0.08712746 -1.0296026
#> Observation_3 -0.1415352 0.6623149 -1.0398456 1.87611212 0.7340254
#> Predictor_6 Predictor_7 Predictor_8 Predictor_9 Predictor_10
#> Observation_1 -0.9529327 -0.9344928 -0.32964818 0.4486023 -0.70894600
#> Observation_2 2.2546851 0.2732793 -0.03852896 1.3830463 0.03070716
#> Observation_3 1.7534320 0.3263808 0.09564893 -2.2104531 0.22615224
The external information is encoded as follows:
Z[1:10,]
#> External_variable_1 External_variable_2 External_variable_3
#> Predictor_1 1 0 0
#> Predictor_2 1 0 0
#> Predictor_3 0 1 0
#> Predictor_4 0 1 0
#> Predictor_5 0 0 1
#> Predictor_6 0 0 1
#> Predictor_7 0 0 0
#> Predictor_8 0 0 0
#> Predictor_9 0 0 0
#> Predictor_10 0 0 0
#> External_variable_4
#> Predictor_1 0
#> Predictor_2 0
#> Predictor_3 0
#> Predictor_4 0
#> Predictor_5 0
#> Predictor_6 0
#> Predictor_7 1
#> Predictor_8 1
#> Predictor_9 1
#> Predictor_10 1
Here, each variable in Z is a binary variable. \(Z_{jk}\) indicates if \(Predictor_j\) has \(ExternalVariable_k\) or not. This Z is an example of (non-overlapping) grouping of predictors. Predictor 1 and 2 belongs to group 1; predictor 3 and 4 belongs to group 2; predictor 5 and 6 belongs to group 3, and the rest of the predictors belongs to group 4.
To fit a differential-shrinkage lasso model to this data:
fit.example1 <- xtune(X,Y,Z, family = "linear", c = 1)
#> Z provided, start estimating individual tuning parameters
#> Start estimating alpha:
#> Done!
Here, we specify the family of the model using the linear response and the LASSO type of penalty by assign \(c = 1\). The individual penalty parameters are returned by
fit.example1$penalty.vector
In this example, predictors in each group get different estimated penalty parameters.
unique(fit.example1$penalty.vector)
#> [,1]
#> Predictor_1 7.861552e-03
#> Predictor_3 1.434484e-02
#> Predictor_5 3.337497e-02
#> Predictor_7 1.804549e+02
Coefficient estimates and predicted values and can be obtained via predict
and coef
:
coef_xtune(fit.example1)
predict_xtune(fit.example1, newX = X)
The mse
function can be used to get the mean square error (MSE) between prediction values and true values.
mse(predict(fit.example1, newX = X), Y)
Suppose we want to predict a person’s weight loss (binary outcome) using his/her weekly dietary intake. Our external information Z could incorporate information about the levels of relevant food constituents in the dietary items.
data(diet)
head(diet$DietItems)
#> Milk Margarine Eggs Apples Lettuce Celery Hot dogs Liver Dark bread Pasta
#> [1,] 1 1 4 1 2 1 1 0 0 1
#> [2,] 1 0 0 0 2 4 0 0 2 0
#> [3,] 0 1 2 3 1 3 0 0 4 0
#> [4,] 0 2 0 1 0 1 1 0 3 1
#> [5,] 0 1 1 3 1 1 0 2 2 2
#> [6,] 2 1 2 1 3 0 1 2 2 1
#> Beer Liquor Cookies Bran
#> [1,] 0 2 3 4
#> [2,] 2 1 0 3
#> [3,] 1 2 1 1
#> [4,] 1 1 1 2
#> [5,] 0 0 0 3
#> [6,] 1 0 2 1
head(diet$weightloss)
#> [1] 0 1 1 1 1 1
The external information Z in this example is:
In this example, Z is not a grouping of the predictors. The idea is that the nutrition facts about the dietary items might give us some information on the importance of each predictor in the model.
Similar to the previous example, the xtune model could be fit by:
fit.diet = xtune(X = diet$DietItems,Y=diet$weightloss,Z = diet$NuitritionFact, family="binary", c = 0)
#> No Z matrix provided, only a single tuning parameter will be estimated using empirical Bayes tuning
#> Start estimating alpha:
Here, we use the Ridge model by specifying \(c = 0\). Each dietary predictor is estimated an individual tuning parameter based on their nutrition fact.
To make prediction using the trained model
predict_xtune(fit.diet,newX = diet$DietItems)
The above code returns the predicted probabilities (scores). To make a class prediction, use the type = "class"
option.
pred_class <- predict_xtune(fit.diet,newX = diet$DietItems,type = "class")
The misclassification()
function can be used to extract the misclassification rate. The prediction AUC can be calculated using the auc() function from the AUC package.
misclassification(pred_class,true = diet$weightloss)
The gene
data contains simulated gene expression data. The dimension of data is \(50\times 200\). The outcome Y is continuous (bone mineral density). The external information is four previous study results that identify the biological importance of genes. For example \(Z_{jk}\) means whether \(gene_j\) is identified to be biologically important in previous study \(k\) result. \(Z_{jk} = 1\) means that gene \(j\) is identified by previous study \(k\) result and \(Z_{jk} = 0\) means that gene \(j\) is not identified to be important by previous study \(k\) result.
data(gene)
gene$GeneExpression[1:3,1:5]
#> Gene_1 Gene_2 Gene_3 Gene_4 Gene_5
#> [1,] -0.7667960 1.7520578 0.9212806 -0.6273008 2.0149030
#> [2,] -0.8164583 -0.5477714 -0.3144157 -0.8796116 -0.2253684
#> [3,] -0.1415352 -0.8585257 0.6623149 -0.3053110 -1.0398456
gene$PreviousStudy[1:5,]
#> Identified by previous study 1 Identified by previous study 2
#> Gene_1 0 0
#> Gene_2 0 0
#> Gene_3 0 0
#> Gene_4 0 0
#> Gene_5 0 0
#> Identified by previous study 3 Identified by previous study 4
#> Gene_1 0 0
#> Gene_2 0 0
#> Gene_3 0 0
#> Gene_4 0 0
#> Gene_5 0 0
A gene can be identified to be important by several previous study results, therefore the external information Z in this example can be seen as an overlapping group of variables.
Model fitting:
fit.gene = xtune(X = gene$GeneExpression,Y=gene$bonedensity,Z = gene$PreviousStudy, family = "linear", c = 0.5)
We use the Elastic-net model by specifying \(c = 0.5\) (can be any numerical value from 0 to 1). The rest of the steps are the same as the previous two examples.
data("example.multiclass")
dim(example.multiclass$X); dim(example.multiclass$Y); dim(example.multiclass$Z)
#> [1] 600 800
#> [1] 600 1
#> [1] 800 5
head(example.multiclass$X)[,1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.7445670 1.47901632 -0.1556682 -0.64053491 -1.2694581
#> [2,] 0.2170827 0.08309395 0.8571237 -0.12263159 0.4443480
#> [3,] 0.7843483 2.07273526 0.5046653 -0.56627993 -0.3385034
#> [4,] 0.4509514 -0.34583708 -0.5824597 -0.71907762 -0.5209697
#> [5,] 1.2444328 2.14042805 0.4639056 -0.01205312 1.2121137
#> [6,] -1.2819254 -0.83835437 0.6999223 0.04828028 1.0246388
head(example.multiclass$Y)
#> [,1]
#> [1,] 3
#> [2,] 2
#> [3,] 1
#> [4,] 2
#> [5,] 3
#> [6,] 2
head(example.multiclass$Z)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 0 0 0 0
#> [2,] 0 0 0 0 1
#> [3,] 0 0 0 0 0
#> [4,] 0 0 0 0 0
#> [5,] 0 0 0 1 0
#> [6,] 0 0 0 0 0
Model fitting:
fit.multiclass = xtune(X = example.multiclass$X,Y=example.multiclass$Y,Z = example.multiclass$Z, U = example.multiclass$U, family = "multiclass", c = 0.5)
#> Z provided, start estimating individual tuning parameters
#> Start estimating alpha:
#> Done!
# check the tuning parameter
fit.multiclass$penalty.vector
#> [1] 0.019033238 0.020019923 0.019033238 0.019033238 0.019070537 0.019033238
#> [7] 0.019406624 0.019033238 0.016390346 0.019033238 0.019406624 0.020019923
#> [13] 0.015582545 0.019033238 0.020019923 0.019033238 0.011414296 0.019033238
#> [19] 0.011414296 0.020019923 0.011414296 0.019033238 0.015582545 0.019033238
#> [25] 0.019033238 0.015582545 0.019033238 0.020019923 0.019033238 0.019033238
#> [31] 0.015582545 0.019033238 0.019033238 0.011414296 0.019033238 0.019406624
#> [37] 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624 0.012006014
#> [43] 0.015582545 0.015582545 0.019033238 0.019406624 0.019033238 0.019033238
#> [49] 0.015582545 0.019033238 0.020019923 0.019033238 0.015582545 0.019033238
#> [55] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.011414296
#> [61] 0.019033238 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238
#> [67] 0.019033238 0.015582545 0.019033238 0.016390346 0.019070537 0.019033238
#> [73] 0.019033238 0.011414296 0.019033238 0.020412666 0.019033238 0.019033238
#> [79] 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [85] 0.019406624 0.019033238 0.019033238 0.011414296 0.015582545 0.019033238
#> [91] 0.011414296 0.019033238 0.019033238 0.019033238 0.011414296 0.020059155
#> [97] 0.019033238 0.019033238 0.020019923 0.019033238 0.015582545 0.019033238
#> [103] 0.015888237 0.019033238 0.019070537 0.019033238 0.019406624 0.019033238
#> [109] 0.019033238 0.020019923 0.019070537 0.019033238 0.019033238 0.019033238
#> [115] 0.019033238 0.015582545 0.015582545 0.020019923 0.019033238 0.019406624
#> [121] 0.015582545 0.019033238 0.011414296 0.019070537 0.019033238 0.019033238
#> [127] 0.019033238 0.015582545 0.019033238 0.019033238 0.020019923 0.019033238
#> [133] 0.019033238 0.019033238 0.019070537 0.019033238 0.019444655 0.019033238
#> [139] 0.011414296 0.019033238 0.019033238 0.011414296 0.019033238 0.019033238
#> [145] 0.020019923 0.011414296 0.009344904 0.011414296 0.019033238 0.019033238
#> [151] 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624 0.019033238
#> [157] 0.019033238 0.015582545 0.020019923 0.019406624 0.019033238 0.019033238
#> [163] 0.019406624 0.019033238 0.019070537 0.019033238 0.020019923 0.019406624
#> [169] 0.019070537 0.019070537 0.019033238 0.019070537 0.019406624 0.019033238
#> [175] 0.019033238 0.019033238 0.019033238 0.019070537 0.019070537 0.019033238
#> [181] 0.011414296 0.019033238 0.019033238 0.015582545 0.016390346 0.019033238
#> [187] 0.015582545 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [193] 0.019406624 0.019033238 0.009528229 0.019033238 0.019070537 0.019033238
#> [199] 0.019070537 0.019033238 0.019033238 0.011414296 0.019070537 0.011414296
#> [205] 0.019033238 0.015582545 0.011414296 0.019033238 0.011414296 0.015582545
#> [211] 0.019033238 0.019444655 0.015613082 0.019033238 0.019033238 0.019033238
#> [217] 0.019033238 0.015582545 0.015582545 0.015582545 0.020019923 0.019033238
#> [223] 0.015582545 0.011414296 0.016390346 0.019033238 0.019033238 0.020019923
#> [229] 0.019033238 0.020412666 0.019033238 0.019070537 0.019033238 0.015582545
#> [235] 0.020019923 0.019033238 0.019033238 0.015582545 0.019070537 0.019033238
#> [241] 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624 0.015582545
#> [247] 0.019033238 0.019033238 0.019033238 0.015582545 0.019033238 0.019070537
#> [253] 0.016390346 0.020019923 0.020019923 0.015582545 0.011414296 0.016390346
#> [259] 0.019033238 0.015888237 0.019033238 0.019070537 0.019033238 0.019070537
#> [265] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.020019923
#> [271] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [277] 0.019070537 0.020019923 0.019033238 0.019070537 0.019070537 0.019033238
#> [283] 0.020412666 0.019033238 0.012006014 0.019406624 0.015888237 0.011414296
#> [289] 0.019033238 0.019033238 0.011414296 0.019033238 0.019033238 0.019033238
#> [295] 0.019033238 0.019033238 0.015582545 0.019033238 0.012006014 0.019033238
#> [301] 0.019033238 0.011414296 0.019033238 0.019033238 0.020412666 0.015582545
#> [307] 0.019033238 0.015582545 0.019033238 0.015582545 0.019033238 0.020019923
#> [313] 0.015582545 0.019033238 0.019070537 0.019033238 0.019033238 0.019033238
#> [319] 0.020019923 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [325] 0.020019923 0.019033238 0.019033238 0.011414296 0.019033238 0.019033238
#> [331] 0.019033238 0.019406624 0.019033238 0.020019923 0.019033238 0.019033238
#> [337] 0.020019923 0.015582545 0.019070537 0.019033238 0.019033238 0.015582545
#> [343] 0.011414296 0.019406624 0.011414296 0.011414296 0.019406624 0.011414296
#> [349] 0.015582545 0.019033238 0.019406624 0.019033238 0.019033238 0.020019923
#> [355] 0.019444655 0.019070537 0.019033238 0.019033238 0.019033238 0.016390346
#> [361] 0.019033238 0.015582545 0.019033238 0.019033238 0.011638217 0.009344904
#> [367] 0.020019923 0.019033238 0.019033238 0.019033238 0.019070537 0.019033238
#> [373] 0.009363217 0.019033238 0.019070537 0.019033238 0.019070537 0.019033238
#> [379] 0.019070537 0.015582545 0.019033238 0.020019923 0.019033238 0.019033238
#> [385] 0.015582545 0.019033238 0.019033238 0.016390346 0.019033238 0.011414296
#> [391] 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [397] 0.019033238 0.015582545 0.019033238 0.020019923 0.015582545 0.019033238
#> [403] 0.019033238 0.019033238 0.015582545 0.011414296 0.019070537 0.020019923
#> [409] 0.019033238 0.019033238 0.019070537 0.020059155 0.019033238 0.019070537
#> [415] 0.011436664 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [421] 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238 0.015613082
#> [427] 0.015582545 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238
#> [433] 0.019033238 0.020059155 0.019070537 0.019033238 0.019033238 0.019033238
#> [439] 0.019406624 0.011414296 0.015888237 0.019033238 0.019033238 0.019033238
#> [445] 0.011414296 0.019033238 0.020019923 0.019033238 0.020019923 0.015582545
#> [451] 0.020059155 0.019033238 0.019033238 0.019033238 0.019033238 0.011414296
#> [457] 0.019070537 0.019033238 0.019033238 0.020412666 0.019070537 0.015582545
#> [463] 0.019033238 0.019070537 0.019033238 0.019033238 0.015582545 0.019033238
#> [469] 0.019070537 0.019070537 0.019033238 0.019406624 0.019033238 0.019033238
#> [475] 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238 0.019070537
#> [481] 0.019033238 0.019033238 0.019033238 0.015582545 0.020019923 0.019033238
#> [487] 0.019033238 0.019033238 0.019033238 0.019406624 0.019406624 0.019033238
#> [493] 0.015582545 0.011414296 0.019033238 0.019033238 0.011414296 0.019033238
#> [499] 0.016390346 0.019033238 0.019070537 0.019406624 0.015582545 0.019033238
#> [505] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.015582545
#> [511] 0.011414296 0.019033238 0.011414296 0.011414296 0.019033238 0.019070537
#> [517] 0.019070537 0.019033238 0.019033238 0.020019923 0.020019923 0.019033238
#> [523] 0.019033238 0.019070537 0.019033238 0.019406624 0.019033238 0.020019923
#> [529] 0.015582545 0.019406624 0.019033238 0.016390346 0.019033238 0.019033238
#> [535] 0.019033238 0.011414296 0.020019923 0.015582545 0.019033238 0.019033238
#> [541] 0.009528229 0.019033238 0.019033238 0.019406624 0.019070537 0.019033238
#> [547] 0.019033238 0.020019923 0.019033238 0.019033238 0.019033238 0.019033238
#> [553] 0.019033238 0.019033238 0.019033238 0.019033238 0.019070537 0.019033238
#> [559] 0.015582545 0.015582545 0.019033238 0.019033238 0.011414296 0.019033238
#> [565] 0.019033238 0.012006014 0.019033238 0.019033238 0.019033238 0.019033238
#> [571] 0.019444655 0.019033238 0.019033238 0.019033238 0.019406624 0.019070537
#> [577] 0.019033238 0.011414296 0.019033238 0.019033238 0.019033238 0.019033238
#> [583] 0.015582545 0.019033238 0.019033238 0.019444655 0.019033238 0.019033238
#> [589] 0.020019923 0.019406624 0.019406624 0.020019923 0.019033238 0.019033238
#> [595] 0.019033238 0.009344904 0.019444655 0.011414296 0.019070537 0.019033238
#> [601] 0.019033238 0.015582545 0.019033238 0.019033238 0.019406624 0.016422466
#> [607] 0.019033238 0.019033238 0.019033238 0.020412666 0.009344904 0.019444655
#> [613] 0.015582545 0.019033238 0.015582545 0.019033238 0.019406624 0.019033238
#> [619] 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238 0.015582545
#> [625] 0.019033238 0.019406624 0.020019923 0.019033238 0.019033238 0.015888237
#> [631] 0.019033238 0.019033238 0.019033238 0.020019923 0.019033238 0.019033238
#> [637] 0.020019923 0.019033238 0.019033238 0.019070537 0.011414296 0.019033238
#> [643] 0.019070537 0.019033238 0.019406624 0.019070537 0.019033238 0.019406624
#> [649] 0.012006014 0.019033238 0.019033238 0.019033238 0.011414296 0.019033238
#> [655] 0.019070537 0.019033238 0.019070537 0.019070537 0.019033238 0.019033238
#> [661] 0.019033238 0.019033238 0.019033238 0.019033238 0.019070537 0.019033238
#> [667] 0.019444655 0.019033238 0.015888237 0.019033238 0.020019923 0.011414296
#> [673] 0.019406624 0.019033238 0.019033238 0.019033238 0.019033238 0.019406624
#> [679] 0.020019923 0.019033238 0.015582545 0.015582545 0.019070537 0.019033238
#> [685] 0.019033238 0.019033238 0.019406624 0.019406624 0.019033238 0.009528229
#> [691] 0.019033238 0.019070537 0.015582545 0.019033238 0.019033238 0.019033238
#> [697] 0.019033238 0.019033238 0.019033238 0.015582545 0.020019923 0.015582545
#> [703] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [709] 0.019033238 0.020019923 0.019444655 0.012006014 0.019406624 0.019033238
#> [715] 0.019033238 0.015582545 0.015582545 0.020019923 0.019033238 0.019033238
#> [721] 0.015582545 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238
#> [727] 0.019406624 0.019406624 0.019033238 0.019033238 0.019033238 0.020019923
#> [733] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.011414296
#> [739] 0.019033238 0.019033238 0.019033238 0.019070537 0.019070537 0.020019923
#> [745] 0.015888237 0.011436664 0.019033238 0.019033238 0.015582545 0.012006014
#> [751] 0.020019923 0.019033238 0.019033238 0.019070537 0.019033238 0.020019923
#> [757] 0.015582545 0.020019923 0.019033238 0.019033238 0.019033238 0.019033238
#> [763] 0.019406624 0.019033238 0.015888237 0.019070537 0.019033238 0.015582545
#> [769] 0.019033238 0.019033238 0.019033238 0.019033238 0.019070537 0.015582545
#> [775] 0.019033238 0.011414296 0.019406624 0.019033238 0.019033238 0.019033238
#> [781] 0.019033238 0.019406624 0.019033238 0.019033238 0.019033238 0.019033238
#> [787] 0.019033238 0.019033238 0.019033238 0.019033238 0.019033238 0.020452668
#> [793] 0.019033238 0.019033238 0.019033238 0.019033238 0.020019923 0.015582545
#> [799] 0.019033238 0.019406624 0.000000000 0.000000000 0.000000000 0.000000000
#> [805] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
To make prediction using the trained model:
pred.prob = predict_xtune(fit.multiclass,newX = cbind(example.multiclass$X, example.multiclass$U))
head(pred.prob)
#> 1.1 2.1 3.1
#> 1 0.152714695 0.04651872 0.80076659
#> 2 0.001784845 0.96370971 0.03450544
#> 3 0.594903813 0.14663364 0.25846255
#> 4 0.040101522 0.84879438 0.11110409
#> 5 0.084372394 0.17279569 0.74283192
#> 6 0.012523512 0.89878034 0.08869615
The above code returns the predicted probabilities (scores) for each class. To make a class prediction, specify the argument type = "class"
.
pred.class <- predict_xtune(fit.multiclass,newX = cbind(example.multiclass$X, example.multiclass$U), type = "class")
head(pred.class)
#> [1] "3" "2" "1" "2" "3" "2"
The misclassification()
function can be used to extract the misclassification rate. The multiclass AUC can be calculated using the multiclass.roc
function from the pROC
package.
If you just want to tune a single penalty parameter using empirical Bayes tuning, simply do not provide Z in the xtune()
function. If no external information Z is provided, the function will perform empirical Bayes tuning to choose the single penalty parameter in penalized regression, as an alternative to cross-validation. For example
fit.eb <- xtune(X,Y, family = "linear", c = 0.5)
#> No Z matrix provided, only a single tuning parameter will be estimated using empirical Bayes tuning
#> Start estimating alpha:
#> Done!
The estimated tuning parameter is:
fit.eb$lambda
If you provide an identity matrix as external information Z to xtune()
, the function will estimate a separate tuning parameter \(\lambda_j\) for each regression coefficient \(\beta_j\). Note that this is not advised when the number of predictors \(p\) is very large.
Using the dietary example, the following code would estimate a separate penalty parameter for each coefficient.
Z_iden = diag(ncol(diet$DietItems))
fit.diet.identity = xtune(diet$DietItems,diet$weightloss,Z_iden, family = "binary", c = 0.5)
#> Z provided, start estimating individual tuning parameters
#> Start estimating alpha:
fit.diet.identity$penalty.vector
#> [,1]
#> [1,] 1.619150e+02
#> [2,] 2.667150e+02
#> [3,] 8.667794e-02
#> [4,] 1.160597e-02
#> [5,] 1.514302e+02
#> [6,] 2.667643e+02
#> [7,] 2.585258e+02
#> [8,] 1.315713e-01
#> [9,] 2.784969e-03
#> [10,] 4.277238e+02
#> [11,] 5.663159e+02
#> [12,] 2.438229e+02
#> [13,] 1.000000e+03
#> [14,] 6.516246e+02
A predictor is excluded from the model (regression coefficient equals to zero) if its corresponding penalty parameter is estimated to be infinity.
We presented the main usage of xtune
package. For more details about each function, please go check the package documentation. If you would like to give us feedback or report issue, please tell us on Github.