The \(\texttt{studyStrap}\) package implements multi-Study Learning algorithms such as Merging, Study-Specific Ensembling (Trained-on-Observed-Studies Ensemble), the Study Strap, and the Covariate-Matched Study Strap. It calculates and applies Covariate Profile Similarity and Stacking weights. By training models within the \(\texttt{caret}\) ecosystem, this package can flexibly apply different methods (e.g., random forests, linear regression, neural networks) as single-study learners within the multi-Study ensembling framework. The package allows for multiple single-study learners per study as well as custom functions for Covariate Profile Similarity weighting and for the accept/reject step utilized in the Covariate-Matched Study Strap. The prediction function allows use of this framework without having to manually ensemble and weight model predictions.
Below we offer a few basic examples using the core functions of the package. We begin by simulating a multi-study prediction setting.
set.seed(1)
library(studyStrap)
# create half of training dataset from 1 distribution
X1 <- matrix(rnorm(2000), ncol = 2) # design matrix - 2 covariates
B1 <- c(5, 10, 15) # true beta coefficients
y1 <- cbind(1, X1) %*% B1
# create 2nd half of training dataset from another distribution
X2 <- matrix(rnorm(2000, 1,2), ncol = 2) # design matrix - 2 covariates
B2 <- c(10, 5, 0) # true beta coefficients
y2 <- cbind(1, X2) %*% B2
X <- rbind(X1, X2)
y <- c(y1, y2)
study <- sample.int(10, 2000, replace = TRUE) # 10 studies
data <- data.frame( Study = study, Y = y, V1 = X[,1], V2 = X[,2] )
# create target study design matrix for covariate profile similarity weighting and
# accept/reject algorithm (Covariate-matched study strap)
target <- matrix(rnorm(1000, 3, 5), ncol = 2) # design matrix
colnames(target) <- c("V1", "V2")
We have 10 studies (combined into a single dataframe), each with an outcome vector \(\mathbf{Y}\) and two covariates \(V1\) and \(V2\).
head(data)
## Study Y V1 V2
## 1 6 15.759938 -0.6264538 1.13496509
## 2 1 23.515411 0.1836433 1.11193185
## 3 10 -16.417951 -0.8356286 -0.87077763
## 4 6 24.113782 1.5952808 0.21073159
## 5 7 9.336012 0.3295078 0.06939565
## 6 6 -28.144417 -0.8204684 -1.66264885
We begin with the basic ensembling setting (the Study-Specific Ensemble or Trained-on-Observed-Studies Ensemble) where we train one or more models on each study and then ensemble the models.
Here we just use one single-study learner: PCR. We assume one has tuned the model to their liking and specifies the tuning parameters as they would in caret. Here we show an example of a custom function used for Covariate Profile Similarity weighting but we point out that this is not necessary.
Moreover, we specify a target study to allow for Covariate Profile Similarity weighting. This is unnecessary and we show an example without this below.
# custom function
fn1 <- function(x1,x2){
return( abs( cor( colMeans(x1), colMeans(x2) )) )
}
sseMod1 <- sse(formula = Y ~.,
data = data,
target.study = target,
ssl.method = list("pcr"),
ssl.tuneGrid = list(data.frame("ncomp" = 1)),
customFNs = list(fn1) )
preds <- studyStrap.predict(sseMod1, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking customFn_1
FALSE [1,] 0.1774518 -13.653802 0.1774518
FALSE [2,] 5.9290711 -4.472652 5.9290711
FALSE [3,] 30.8083742 35.331259 30.8083742
The predictions are a matrix here since we have default Covariate Profile Similarity measures, stacking weights and the custom weighting function we used. Notice that the custom weights are identical to those of the “Mean Corr” weights by design. The first column is a simple average of the predictions from all of the models.
As above, we run the same algorithm but for each study, we now train a model on both linear regression and PCR.
# custom function
fn1 <- function(x1,x2){
return( abs( cor( colMeans(x1), colMeans(x2) )) )
}
sseMod2 <- sse(formula = Y ~.,
data = data,
target.study = target,
ssl.method = list("lm","pcr"),
ssl.tuneGrid = list(NA, data.frame("ncomp" = 2)),
customFNs = list(fn1) )
Making predictions is identical and produces output with identical structure. The function will automatically account for the fact that each study has a model trained on linear regression and a model trained on PCR. Covariate Profile Similarity weights account for these by weighting equally two models trained on the same data.
preds <- studyStrap.predict(sseMod2, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking customFn_1
FALSE [1,] -13.995010 -14.066971 -13.995010
FALSE [2,] -4.730832 -4.792895 -4.730832
FALSE [3,] 35.433599 35.415095 35.433599
Now let us assume we do not have a target study to generate Covariate Profile Similarity weights.
sseMod3 <- sse(formula = Y ~.,
data = data,
ssl.method = list("pcr"),
ssl.tuneGrid = list(NA, data.frame("ncomp" = 1)),
sim.mets = FALSE)
preds <- studyStrap.predict(sseMod3, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking
FALSE [1,] 0.1774518 -13.653802
FALSE [2,] 5.9290711 -4.472652
FALSE [3,] 30.8083742 35.331259
Since we do not have a target study we cannot generate Covariate Profile Similarity weights and predictions are only for stacking and simple averaging.
Now let us move on to another standard multi-study learning method, Merging:
# 1 SSL
mrgMod1 <- merged(formula = Y ~.,
data = data,
ssl.method = list("pcr"),
ssl.tuneGrid = list( data.frame("ncomp" = 2))
)
# 2 SSLs
mrgMod2 <- merged(formula = Y ~.,
data = data,
ssl.method = list("lm","pcr"),
ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))
)
Predictions only produce 1 vector of predictions listed under Avg.
preds <- studyStrap.predict(mrgMod2, target)
head(preds)
FALSE Avg NA_Stacking
FALSE [1,] -14.066971 NA
FALSE [2,] -4.792895 NA
FALSE [3,] 35.415095 NA
FALSE [4,] 53.204946 NA
FALSE [5,] 60.384128 NA
FALSE [6,] 29.725637 NA
We now demonstrate the use of the Study Strap with 10 straps and all available weighting schemes.
# custom function
fn1 <- function(x1,x2){
return( abs( cor( colMeans(x1), colMeans(x2) )) )
}
# 1 SSL
ssMod1 <- ss(formula = Y ~.,
data = data,
target.study = target,
bag.size = length(unique(data$Study)),
straps = 10,
stack = "standard",
sim.covs = NA,
ssl.method = list("pcr"),
ssl.tuneGrid = list(data.frame("ncomp" = 2)),
sim.mets = TRUE,
model = TRUE,
customFNs = list( fn1 ) )
# 2 SSLs
ssMod2 <- ss(formula = Y ~.,
data = data,
target.study = target,
bag.size = length(unique(data$Study)),
straps = 10,
stack = "standard",
sim.covs = NA,
ssl.method = list("lm","pcr"),
ssl.tuneGrid = list(NA, data.frame("ncomp" = 2)),
sim.mets = TRUE,
model = TRUE,
customFNs = list( fn1 ) )
Predictions have the same structure as the Study-Specific Ensemble.
preds <- studyStrap.predict(ssMod2, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking Matcor Diag Matcor Sum Matcor Sum Abs
FALSE [1,] -14.627272 -14.066971 -14.284018 -14.452172 -14.33067
FALSE [2,] -5.138826 -4.792895 -4.828864 -4.823478 -4.86358
FALSE [3,] 35.998061 35.415095 36.161704 36.918680 36.17842
FALSE |rho| rho sq UV rho sq UV cov sq UV rho UV cov
FALSE [1,] -14.211107 -14.099586 -14.740194 -14.822641 -14.956572 -15.330968
FALSE [2,] -4.776769 -4.663318 -5.269537 -5.334426 -5.438017 -5.734821
FALSE [3,] 36.123186 36.243555 35.789701 35.801652 35.830304 35.874078
FALSE diag UV rho sq diag UV cov diag UV cov sq Mean Corr SMI RV
FALSE [1,] -14.740194 -14.828585 -14.638752 -14.627272 -14.112789 -14.197656
FALSE [2,] -5.269537 -5.320772 -5.189034 -5.138826 -4.673056 -4.735643
FALSE [3,] 35.789701 35.900237 35.777924 35.998061 36.248897 36.283369
FALSE RV2 RVadj PSI r1 r2 r3
FALSE [1,] -15.360730 -15.353452 -14.239038 -13.350311 -13.117915 -13.273435
FALSE [2,] -5.662882 -5.660354 -4.799337 -3.997512 -3.938409 -3.938728
FALSE [3,] 36.383972 36.366180 36.124056 36.543014 35.852250 36.522881
FALSE r4 GCD customFn_1
FALSE [1,] -13.018795 -14.112789 -14.627272
FALSE [2,] -3.861005 -4.673056 -5.138826
FALSE [3,] 35.834974 36.248897 35.998061
Now let’s say we do not want to use the custom similarity measures. We can turn these off and this will significantly improve the time it takes to fit the models and will alter the structure of the prediction output. We must specify the bag size. The default is to use the number of training studies, but this must be tuned for optimal performance.
# custom function
fn1 <- function(x1,x2){
return( abs( cor( colMeans(x1), colMeans(x2) )) )
}
ssMod3 <- ss(formula = Y ~.,
data = data,
target.study = target,
bag.size = length(unique(data$Study)),
straps = 10,
sim.covs = NA, ssl.method = list("pcr"),
ssl.tuneGrid = list(data.frame("ncomp" = 2)),
sim.mets = FALSE,
customFNs = list( fn1 ) )
preds <- studyStrap.predict(ssMod3, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking customFn_1
FALSE [1,] -12.97376 -14.066971 -12.97376
FALSE [2,] -4.03119 -4.792895 -4.03119
FALSE [3,] 34.73606 35.415095 34.73606
Now, let’s deal with the case when we do not have a target study at all. We can simply remove this argument and our predictions will be limited to a simple average and stacking weights.
ssMod4 <- ss(formula = Y ~.,
data = data,
bag.size = length(unique(data$Study)),
straps = 10,
sim.covs = NA, ssl.method = list("pcr"),
ssl.tuneGrid = list(data.frame("ncomp" = 2)),
sim.mets = FALSE)
preds <- studyStrap.predict(ssMod4, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking
FALSE [1,] -12.27087 -14.066971
FALSE [2,] -3.45521 -4.792895
FALSE [3,] 34.75931 35.415095
Now we turn to the accept/reject algorithm. Here we must specify a target study. We need to specify the number of paths (we recommend 5) and the convergence limit (number of consecutive rejected study straps to meet convergence criteria). This depends on computational cost, but we would recommend at least 1000 and the more the better. Here we choose a low number for demonstration purposes. We could choose a custom function ( sim.fn ) for the accept/reject step or use the default of \(|cor(\bar{x}^{(r)}, \bar{x}_{target}) |\). Similarly we can provide custom functions for weighting as above. We also specify the maximum number of study straps allowed in total in case many are accepted without convergence. We recommend 50 straps per path to be safe, but this is obviously application specific and depends on the distribution of the covariates.
We could use 1 SSL or multiple SSLs as above. We need to specify the bag size as in the Study Strap algorithm. The default is to use the number of training studies, but this must be tuned for optimal performance.
# 1 SSL
arMod1 <- cmss(formula = Y ~.,
data = data,
target.study = target,
converge.lim = 2,
bag.size = length(unique(data$Study)),
max.straps = 50,
paths = 2,
ssl.method = list("pcr"),
ssl.tuneGrid = list(data.frame("ncomp" = 2))
)
# 2 SSLs
arMod2 <- cmss(formula = Y ~.,
data = data,
target.study = target,
converge.lim = 2,
bag.size = length(unique(data$Study)),
max.straps = 50,
paths = 2,
ssl.method = list("lm","pcr"),
ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))
)
preds <- studyStrap.predict(arMod2, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking Matcor Diag Matcor Sum Matcor Sum Abs
FALSE [1,] -14.040566 -13.760921 -13.920651 -13.889937 -14.145602
FALSE [2,] -4.813971 -4.555755 -4.730099 -4.683699 -4.897403
FALSE [3,] 35.192579 35.352613 35.119938 35.232755 35.203636
FALSE |rho| rho sq UV rho sq UV cov sq UV rho UV cov
FALSE [1,] -14.169412 -14.249983 -14.442813 -14.443724 -14.407886 -13.747953
FALSE [2,] -4.914306 -4.980559 -5.146644 -5.147101 -5.113404 -4.595961
FALSE [3,] 35.216743 35.213340 35.165515 35.167016 35.190825 35.085728
FALSE diag UV rho sq diag UV cov diag UV cov sq Mean Corr SMI RV
FALSE [1,] -14.442813 -13.719055 -14.001783 -14.040566 -14.253161 -14.245844
FALSE [2,] -5.146644 -4.615958 -4.836169 -4.813971 -4.984143 -4.974931
FALSE [3,] 35.165515 34.856149 34.908938 35.192579 35.208081 35.225245
FALSE RV2 RVadj PSI r1 r2 r3
FALSE [1,] -14.882410 -14.652337 -14.160104 -13.704759 -13.743827 -13.709580
FALSE [2,] -5.425419 -5.242989 -4.905941 -4.572621 -4.592698 -4.573953
FALSE [3,] 35.583131 35.557210 35.220889 35.023320 35.085220 35.037006
FALSE r4 GCD
FALSE [1,] -13.744843 -14.253161
FALSE [2,] -4.591946 -4.984143
FALSE [3,] 35.093549 35.208081
Now let us use the accept/reject step based upon our own custom function (sim.fn). We turn off the default Covariate Profile Similarity weights to speed up runtime (sim.mets = FALSE) but provide 2 of our own custom functions for Covariate Profile Similarity weights.
# 1 SSL
# custom function for CPS
fn1 <- function(x1,x2){
return( abs( cor( colMeans(x1), colMeans(x2) )) )
}
# custom function for Accept/Reject step criteria
fn2 <- function(x1,x2){
return( sum ( ( colMeans(x1) - colMeans(x2) )^2 ) )
}
arMod3 <- cmss(formula = Y ~.,
data = data,
target.study = target,
converge.lim = 2,
bag.size = length(unique(data$Study)),
max.straps = 50,
paths = 2,
ssl.method = list("pcr"),
ssl.tuneGrid = list(data.frame("ncomp" = 2)),
sim.mets = FALSE,
sim.fn = fn2,
customFNs = list( fn1, fn2 )
)
preds <- studyStrap.predict(arMod3, target)
head(preds)[1:3,]
FALSE Avg standard_Stacking customFn_1 customFn_2
FALSE [1,] -13.963306 -14.066971 -13.963306 -14.026422
FALSE [2,] -4.777185 -4.792895 -4.777185 -4.816016
FALSE [3,] 35.052555 35.415095 35.052555 35.118968
Now that we understand how to fit models, let us take a second to explore the model object that the package produces. The model objects are S3 classes. That is, they are functionally a list.
sseMod1
## $models
## $models[[1]]
## $models[[1]][[1]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[2]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[3]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[4]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[5]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[6]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[7]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[8]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[9]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## $models[[1]][[10]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
##
##
## $data
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## list()
##
## $dataInfo
## $dataInfo$studyNames
## [1] 6 1 10 7 4 8 3 2 9 5
##
## $dataInfo$sampleSizes
## [1] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000
##
##
## $modelInfo
## $modelInfo$sampling
## [1] "studySpecificEnsemble"
##
## $modelInfo$numStraps
## [1] 10
##
## $modelInfo$SSL
## $modelInfo$SSL[[1]]
## [1] "pcr"
##
##
## $modelInfo$ssl.tuneGrid
## NULL
##
## $modelInfo$numPaths
## [1] NA
##
## $modelInfo$convg.vec
## NULL
##
## $modelInfo$convgCritera
## [1] NA
##
## $modelInfo$meanSamp
## [1] NA
##
## $modelInfo$stack.type
## [1] "standard"
##
## $modelInfo$custFNs
## $modelInfo$custFNs[[1]]
## function(x1,x2){
## return( abs( cor( colMeans(x1), colMeans(x2) )) )
## }
## <bytecode: 0x7fb8361996e0>
##
##
## $modelInfo$bagSize
## [1] 1
##
##
## [[7]]
## NULL
##
## $simMat
## customFn_1
## [1,] 0.1
## [2,] 0.1
## [3,] 0.1
## [4,] 0.1
## [5,] 0.1
## [6,] 0.1
## [7,] 0.1
## [8,] 0.1
## [9,] 0.1
## [10,] 0.1
##
## $stack.coefs
## [1] 0.00000000 0.00000000 0.00000000 0.00000000 0.84196123 0.00000000
## [7] 0.00000000 0.09382132 0.00000000 0.00000000 0.00000000
##
## $strapRows
## $strapRows$length
## [1] 1 4 6 10 22 43 49 53 55 87 100 106 108 128 140
## [16] 149 154 176 180 190 191 203 214 250 253 259 262 273 290 308
## [31] 315 321 329 332 344 355 368 374 396 399 409 414 415 419 429
## [46] 433 443 444 464 466 473 485 491 492 495 503 534 558 561 598
## [61] 605 619 633 642 646 650 656 667 671 684 697 713 740 752 753
## [76] 776 791 810 811 825 842 845 848 857 863 866 921 936 948 974
## [91] 979 993 996 1014 1023 1038 1041 1048 1052 1064 1065 1067 1080 1086 1097
## [106] 1120 1130 1137 1159 1193 1199 1208 1214 1220 1225 1232 1234 1242 1250 1253
## [121] 1275 1286 1304 1305 1358 1360 1362 1368 1378 1406 1428 1430 1433 1453 1469
## [136] 1488 1508 1509 1519 1537 1539 1553 1573 1582 1587 1593 1600 1602 1625 1643
## [151] 1649 1651 1685 1696 1707 1720 1724 1737 1756 1779 1818 1836 1878 1887 1888
## [166] 1913 1917 1919 1934 1946 1956 1969 1974 1975 1976 1978
##
## $strapRows[[2]]
## [1] 2 13 21 29 33 34 44 47 52 57 59 61 76 104 109
## [16] 118 121 123 135 137 158 179 200 217 233 237 256 274 279 280
## [31] 288 314 317 324 328 333 349 354 366 370 375 376 389 397 398
## [46] 400 436 452 467 493 508 516 520 523 535 549 554 567 584 586
## [61] 592 595 604 625 661 665 685 687 692 696 701 703 722 734 745
## [76] 748 757 764 768 778 781 789 815 818 833 838 840 876 888 889
## [91] 911 916 917 919 930 931 951 953 957 964 969 981 985 999 1000
## [106] 1007 1021 1037 1049 1070 1071 1077 1091 1092 1096 1099 1100 1105 1124 1125
## [121] 1148 1162 1165 1167 1170 1182 1186 1198 1200 1210 1222 1244 1247 1251 1254
## [136] 1264 1280 1326 1331 1347 1353 1359 1366 1367 1372 1376 1391 1398 1400 1405
## [151] 1414 1422 1429 1441 1449 1451 1459 1474 1483 1484 1493 1498 1500 1501 1504
## [166] 1510 1533 1556 1561 1571 1576 1577 1588 1603 1627 1641 1642 1670 1671 1682
## [181] 1683 1689 1691 1706 1709 1712 1715 1717 1722 1740 1759 1762 1787 1792 1809
## [196] 1813 1821 1826 1835 1843 1845 1846 1852 1855 1866 1869 1891 1898 1900 1905
## [211] 1943 1945 1948 1953 1967 1968 1998 1999
##
## $strapRows[[3]]
## [1] 3 11 16 18 35 51 71 75 92 95 99 116 117 127 133
## [16] 165 168 170 178 215 236 245 248 254 272 316 320 323 331 360
## [31] 363 379 388 431 437 446 454 461 471 483 515 518 521 530 551
## [46] 559 560 563 575 577 581 589 623 629 635 652 653 664 678 682
## [61] 702 712 731 732 733 756 761 779 828 852 855 868 874 904 906
## [76] 908 925 926 937 939 942 965 991 994 1001 1002 1024 1028 1029 1030
## [91] 1036 1047 1055 1076 1082 1085 1088 1108 1112 1134 1142 1161 1180 1212 1213
## [106] 1233 1241 1246 1252 1255 1262 1283 1292 1332 1343 1351 1357 1375 1384 1403
## [121] 1418 1423 1432 1447 1456 1495 1514 1518 1522 1525 1527 1530 1542 1549 1552
## [136] 1575 1580 1585 1586 1598 1610 1619 1621 1628 1653 1657 1666 1693 1708 1728
## [151] 1738 1751 1757 1764 1766 1767 1775 1776 1777 1780 1782 1786 1808 1825 1830
## [166] 1834 1838 1851 1854 1858 1868 1870 1871 1880 1894 1901 1910 1918 1922 1923
## [181] 1925 1933 1937 1942 1957 1960 1965 1979 1980 1991
##
## $strapRows[[4]]
## [1] 5 41 42 62 67 68 74 102 105 111 113 120 124 148 157
## [16] 160 166 171 184 185 195 207 209 211 213 230 232 234 252 255
## [31] 265 289 291 306 325 338 345 353 356 382 387 391 392 394 402
## [46] 405 418 430 432 435 440 445 449 450 469 472 476 477 480 489
## [61] 538 540 541 544 553 569 585 639 647 648 663 666 670 672 673
## [76] 695 708 715 718 723 728 738 741 762 772 775 777 783 792 794
## [91] 795 797 800 803 808 823 824 841 860 865 872 885 890 894 899
## [106] 905 910 912 918 924 933 945 977 984 1013 1015 1017 1019 1022 1032
## [121] 1033 1035 1039 1051 1059 1069 1079 1087 1113 1135 1143 1145 1158 1189 1201
## [136] 1205 1209 1219 1223 1231 1235 1267 1268 1272 1306 1307 1316 1320 1325 1330
## [151] 1335 1365 1385 1402 1411 1412 1439 1448 1452 1457 1468 1478 1487 1489 1497
## [166] 1499 1505 1507 1538 1543 1566 1569 1570 1599 1601 1611 1624 1631 1634 1636
## [181] 1645 1648 1654 1655 1660 1672 1694 1700 1704 1729 1731 1749 1771 1781 1785
## [196] 1804 1806 1814 1817 1837 1847 1856 1862 1865 1875 1882 1885 1897 1899 1911
## [211] 1926 1931 1947 1950 1951 1963 1977 1981 1986 1994 2000
##
## $strapRows[[5]]
## [1] 7 23 26 31 50 54 66 72 73 83 90 91 93 98 115
## [16] 136 142 159 162 163 169 187 198 205 218 226 227 239 257 261
## [31] 263 266 281 285 301 327 336 343 347 352 365 377 381 383 393
## [46] 406 412 413 427 434 439 478 484 488 497 502 504 510 525 533
## [61] 555 571 579 580 583 596 597 601 607 608 614 618 624 626 627
## [76] 630 640 660 674 679 691 693 694 707 709 714 729 739 742 750
## [91] 755 769 774 782 784 786 788 796 831 846 850 877 882 896 900
## [106] 909 920 922 932 941 943 950 954 961 972 992 1016 1026 1034 1040
## [121] 1043 1058 1068 1101 1109 1110 1117 1121 1131 1147 1173 1175 1185 1204 1207
## [136] 1211 1216 1227 1228 1257 1273 1295 1296 1301 1311 1312 1322 1338 1381 1382
## [151] 1386 1394 1396 1401 1413 1417 1435 1467 1475 1476 1494 1502 1512 1520 1521
## [166] 1528 1534 1544 1545 1546 1594 1617 1626 1632 1635 1647 1650 1652 1658 1687
## [181] 1697 1711 1714 1719 1733 1746 1753 1774 1788 1791 1795 1805 1857 1859 1883
## [196] 1912 1914 1915 1916 1924 1927 1938 1939 1966 1970 1972 1983 1984 1996
##
## $strapRows[[6]]
## [1] 8 12 40 65 69 89 96 119 131 161 173 193 199 223 224
## [16] 225 228 240 244 283 293 297 300 304 311 348 373 384 395 403
## [31] 428 438 441 442 451 456 459 462 470 475 499 511 528 536 543
## [46] 548 556 557 582 599 600 606 609 610 613 655 669 686 705 743
## [61] 758 759 780 787 802 805 809 812 814 820 834 839 843 851 853
## [76] 856 870 887 895 914 923 927 928 947 952 968 975 976 990 997
## [91] 1003 1004 1009 1020 1027 1031 1042 1054 1074 1083 1093 1094 1104 1166 1169
## [106] 1172 1179 1183 1188 1192 1243 1245 1256 1259 1261 1266 1269 1271 1278 1288
## [121] 1294 1297 1302 1308 1310 1323 1324 1349 1352 1355 1369 1374 1380 1388 1389
## [136] 1392 1409 1416 1420 1424 1426 1437 1443 1444 1446 1479 1513 1515 1532 1540
## [151] 1541 1554 1558 1560 1574 1589 1604 1608 1612 1620 1623 1638 1639 1640 1644
## [166] 1662 1676 1681 1692 1702 1703 1705 1725 1734 1736 1739 1754 1755 1760 1769
## [181] 1810 1816 1819 1822 1831 1839 1853 1861 1867 1874 1892 1902 1909 1940 1941
## [196] 1955 1971 1995
##
## $strapRows[[7]]
## [1] 9 27 28 45 46 56 82 86 103 126 129 138 147 156 164
## [16] 172 189 202 206 208 210 220 221 229 235 258 268 270 275 276
## [31] 307 310 313 318 335 337 341 390 404 407 424 425 460 463 481
## [46] 490 501 522 524 532 539 542 588 591 615 616 621 628 631 662
## [61] 706 716 717 746 766 771 785 790 798 804 827 847 849 858 873
## [76] 875 884 902 944 946 960 967 995 998 1006 1018 1025 1046 1050 1063
## [91] 1098 1102 1114 1122 1127 1129 1139 1141 1149 1153 1155 1156 1160 1163 1184
## [106] 1190 1191 1195 1218 1226 1239 1240 1263 1265 1274 1276 1285 1298 1303 1313
## [121] 1328 1333 1336 1337 1340 1345 1346 1356 1371 1373 1379 1390 1395 1404 1434
## [136] 1442 1460 1461 1463 1466 1472 1473 1481 1503 1516 1517 1548 1564 1567 1578
## [151] 1579 1595 1606 1607 1618 1622 1675 1679 1695 1713 1718 1721 1723 1726 1745
## [166] 1750 1758 1793 1820 1829 1840 1842 1848 1864 1872 1873 1877 1881 1893 1895
## [181] 1921 1928 1932 1952 1987 1989 1993 1997
##
## $strapRows[[8]]
## [1] 14 24 36 48 60 70 79 80 85 88 97 101 112 122 146
## [16] 174 181 182 183 192 194 196 222 238 251 271 277 278 286 292
## [31] 298 299 305 350 357 358 359 367 372 378 380 411 417 420 423
## [46] 448 482 486 487 527 531 547 550 552 572 573 593 603 612 620
## [61] 632 641 644 651 654 658 668 675 677 681 683 688 689 720 724
## [76] 727 736 754 760 767 770 806 816 821 826 832 862 869 883 901
## [91] 903 929 934 935 955 971 978 980 982 988 1010 1011 1044 1045 1056
## [106] 1073 1081 1089 1090 1107 1111 1115 1118 1119 1126 1128 1132 1151 1152 1168
## [121] 1177 1202 1203 1279 1281 1289 1291 1293 1300 1309 1319 1327 1329 1341 1348
## [136] 1350 1354 1364 1399 1415 1419 1421 1440 1458 1471 1482 1485 1486 1490 1491
## [151] 1511 1531 1535 1547 1551 1559 1565 1581 1592 1616 1630 1633 1659 1667 1668
## [166] 1669 1684 1698 1744 1761 1790 1815 1827 1860 1886 1889 1908 1935 1954 1959
## [181] 1962 1982 1990 1992
##
## $strapRows[[9]]
## [1] 15 17 19 20 30 37 58 63 64 77 78 81 94 107 114
## [16] 130 141 143 144 145 150 151 152 153 167 177 197 201 204 219
## [31] 231 243 247 249 260 282 287 294 295 296 302 303 309 312 319
## [46] 326 330 342 346 362 371 401 410 416 426 447 465 474 479 494
## [61] 498 500 505 506 507 509 512 517 519 526 529 537 546 562 564
## [76] 565 568 587 590 594 602 617 622 637 643 645 657 676 680 690
## [91] 700 704 711 725 726 730 735 737 744 747 763 765 773 793 801
## [106] 819 822 829 830 836 837 844 854 861 867 871 879 893 897 949
## [121] 958 959 970 973 986 987 1005 1008 1053 1066 1078 1103 1116 1133 1144
## [136] 1146 1174 1176 1181 1196 1206 1215 1229 1230 1260 1277 1290 1299 1315 1317
## [151] 1318 1321 1334 1339 1361 1383 1436 1445 1454 1464 1465 1477 1492 1496 1506
## [166] 1524 1529 1550 1562 1563 1583 1584 1591 1614 1615 1646 1664 1673 1678 1680
## [181] 1701 1710 1727 1735 1741 1743 1747 1763 1765 1768 1772 1784 1797 1798 1800
## [196] 1803 1807 1812 1824 1833 1841 1844 1863 1879 1890 1896 1904 1906 1907 1936
## [211] 1944 1949 1958 1964
##
## $strapRows[[10]]
## [1] 25 32 38 39 84 110 125 132 134 139 155 175 186 188 212
## [16] 216 241 242 246 264 267 269 284 322 334 339 340 351 361 364
## [31] 369 385 386 408 421 422 453 455 457 458 468 496 513 514 545
## [46] 566 570 574 576 578 611 634 636 638 649 659 698 699 710 719
## [61] 721 749 751 799 807 813 817 835 859 864 878 880 881 886 891
## [76] 892 898 907 913 915 938 940 956 962 963 966 983 989 1012 1057
## [91] 1060 1061 1062 1072 1075 1084 1095 1106 1123 1136 1138 1140 1150 1154 1157
## [106] 1164 1171 1178 1187 1194 1197 1217 1221 1224 1236 1237 1238 1248 1249 1258
## [121] 1270 1282 1284 1287 1314 1342 1344 1363 1370 1377 1387 1393 1397 1407 1408
## [136] 1410 1425 1427 1431 1438 1450 1455 1462 1470 1480 1523 1526 1536 1555 1557
## [151] 1568 1572 1590 1596 1597 1605 1609 1613 1629 1637 1656 1661 1663 1665 1674
## [166] 1677 1686 1688 1690 1699 1716 1730 1732 1742 1748 1752 1770 1773 1778 1783
## [181] 1789 1794 1796 1799 1801 1802 1811 1823 1828 1832 1849 1850 1876 1884 1903
## [196] 1920 1929 1930 1961 1973 1985 1988
##
##
## attr(,"class")
## [1] "ss"
Let us begin by exploring how the models are stored.
sseMod1$models
## [[1]]
## [[1]][[1]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[2]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[3]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[4]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[5]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[6]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[7]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[8]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[9]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
##
## [[1]][[10]]
## Principal Component Analysis
##
## No pre-processing
## Resampling: None
Models are organized as a list of lists. Each element in the primary list is itself a list of all the models trained on one single study learner (e.g., lm, random forests). Each element in that list is a model trained on a study/study strap. Here we have only one single study learner (PCR), so the list is of length 10.
Model Info provides information about how the models were fit. These are stored based upon user input when fitting the model.
names(sseMod1$modelInfo)
## [1] "sampling" "numStraps" "SSL" "ssl.tuneGrid" "numPaths"
## [6] "convg.vec" "convgCritera" "meanSamp" "stack.type" "custFNs"
## [11] "bagSize"
Data Info provides information about the raw data that was fed to the model fitting functions. Original data is stored if “model = TRUE” is specified.
names(sseMod1$dataInfo)
## [1] "studyNames" "sampleSizes"
simMat provides the similarity matrix that is used for Covariate Profile Similarity weights.