To install the package from CRAN, type:
install.packages("bcROCsurface")
Next, load the package.
library(bcROCsurface)
## Loading required package: nnet
## Loading required package: rgl
## Loading required package: boot
## Loading required package: parallel
To illustrate the use of the package bcROCsurface
, we
consisder an example, which presents the evaluation process for biomaker
CA125
in the diagnosis of epithelial ovarian cancer
(EOC).
data(EOC)
The data have 278 observations on the following 6 variables:
head(EOC)
## D.full V D CA125 CA153 Age
## 1 3 1 3 3.304971965 1.42822875 41
## 2 1 0 NA 0.112479444 0.11665310 52
## 3 2 1 2 2.375011262 -0.04096794 50
## 4 1 0 NA -0.001545381 0.32111633 66
## 5 1 0 NA 0.278200345 -0.14283052 52
## 6 2 0 NA 0.167645382 0.81470563 50
In data set, CA125
and CA153
are two
biomarkers, Age
is the age of the patients. The variable
V
is the verification status; 1 and 0 indicates verified
and non-verified subject, respectively. D.full
is disease
status, which consist of three classes, says, 1, 2, 3. These levels
correspond to benign disease, early stage (I and II) and late stage (III
and IV). On the other hand, D
is missing disease
status.
The ROC surface and VUS are only applied when an monotone increasing
ordering is of interest. Thus, before estimate ROC and also VUS, we have
to be sure that the ordering of disease classes is monotone incresasing
(with respect to the diagnostic test values). In order to do that, the
function pre_data()
is usefull.
dise_full <- pre_data(EOC$D.full, EOC$CA125)
## The sample means of diagostic test based on three classes.
## ( 1 ) 1 : 0.192
## ( 2 ) 2 : 1.81
## ( 3 ) 3 : 3.214
## The sample median of diagostic test based on three classes.
## ( 1 ) 1 : 0.014
## ( 2 ) 2 : 1.499
## ( 3 ) 3 : 3.381
## The ordering based on median: 1 < 2 < 3
On the other hand, we describe the full disease status
dise_full
as the binary matrix having three columns,
corresponding to three classes of the disease status. Each row
corresponds to a trinomial vector, in which, 1 indicates the subject
belongs to class j with j = 1,2,3. The function pre_data()
also done this work.
head(dise_full$dise)
## [1] 3 1 2 1 1 2
## Levels: 1 2 3
head(dise_full$dise_vec)
## D1 D2 D3
## [1,] 0 0 1
## [2,] 1 0 0
## [3,] 0 1 0
## [4,] 1 0 0
## [5,] 1 0 0
## [6,] 0 1 0
We construct the ROC surface of full data, and estimate the VUS.
dise_vec_full <- dise_full$dise_vec
rocs(method = "full", diag_test = EOC$CA125, dise_vec = dise_vec_full,
ncp = 30, ellipsoid = TRUE, cpst = c(-0.56, 2.31))
## Hmm, look likes the full data
## Number of observation: 278
## The verification status is not available
## You are working on FULL or Complete Case approach
## The diagnostic test: CA125
## Processing ....
## DONE
## ===============================================================
## Some values of TCFs:
## TCF1 TCF2 TCF3
## (0.4 , 2.627) 0.694 0.493 0.688
## (0.718 , 2.627) 0.769 0.418 0.688
## (0.4 , 2.945) 0.694 0.537 0.636
## (0.718 , 2.945) 0.769 0.463 0.636
## (0.4 , 1.991) 0.694 0.343 0.805
## (0.718 , 1.991) 0.769 0.269 0.805
##
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95
## TCFs at (-0.56, 2.31) are:
## TCF1 TCF2 TCF3
## 0.209 0.642 0.727
## ===============================================================
Here, we consider the full data, so we only need to put the arguments
diag_test
and dise_vec
, and method is
full
.
The FULL estimator of VUS is obtained by the following command:
vus_mar("full", diag_test = EOC$CA125, dise_vec = dise_vec_full, ci = TRUE)
## Hmm, look likes the full data
## The verification status is not available
## You are working on FULL or Complete Case approach
## Number of observation: 278
## The diagnostic test: CA125
## Processing ....
## DONE
##
## CALL: vus_mar(method = "full", diag_test = EOC$CA125, dise_vec = dise_vec_full,
## ci = TRUE)
##
## Estimate of VUS: 0.5663
## Standard error: 0.0377
##
## Intervals :
## Level Normal Logit
## 95% ( 0.4924, 0.6402 ) ( 0.4914, 0.6382 )
## Estimation of Standard Error and Intervals are based on Jackknife approach
##
## Testing the null hypothesis H0: VUS = 1/6
## Test Statistic P-value
## Normal-test 10.597 < 2.2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now, we compute the FI and MSI estimator with missing data. First, we
need to estimate the disease probabilities by using multinomial logistic
model. In bcROCsurface package, this work is done by using
rho_mlogit()
function.
dise_na <- pre_data(EOC$D, EOC$CA125)
## There are missing disease status.
## The sample means of diagostic test based on three classes.
## ( 1 ) 1 : 0.555
## ( 2 ) 2 : 2.112
## ( 3 ) 3 : 3.347
## The sample median of diagostic test based on three classes.
## ( 1 ) 1 : 0.286
## ( 2 ) 2 : 1.972
## ( 3 ) 3 : 3.439
## The ordering based on median: 1 < 2 < 3
dise_vec_na <- dise_na$dise_vec
dise_fact_na <- dise_na$dise
rho_out <- rho_mlogit(dise_fact_na ~ CA125 + CA153 + Age, data = EOC,
test = TRUE)
## ====================================================================
## The p-value calculation for the regression coefficients:
## 1 2
## (Intercept) 1.708e-05 0.0008529
## CA125 1.303e-08 0.0417717
## CA153 6.543e-02 0.0216723
## Age 1.635e-03 0.0031442
## ====================================================================
The following command provides the ROC surface by means of FI esimator:
rocs(method = "fi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, rho_est = rho_out, ncp = 30, ellipsoid = TRUE,
cpst = c(-0.56, 2.31))
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate ROC surface using FI approach
## The diagnostic test: CA125
## The ellipsoidal CR for TCFs are also constructed
## Processing ....
## DONE
## ===============================================================
## Some values of TCFs:
## TCF1 TCF2 TCF3
## (1.036 , 2.945) 0.820 0.362 0.606
## (0.718 , 2.945) 0.747 0.433 0.606
## (1.354 , 2.945) 0.876 0.291 0.606
## (1.036 , 2.627) 0.820 0.312 0.637
## (0.4 , 2.945) 0.670 0.492 0.606
## (0.718 , 2.627) 0.747 0.383 0.637
##
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95
## TCFs at (-0.56, 2.31) are:
## TCF1 TCF2 TCF3
## 0.195 0.586 0.680
## ===============================================================
And, for VUS:
vus_mar(method = "fi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, rho_est = rho_out, ci = TRUE)
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate VUS using FI approach
## The diagnostic test: CA125
## Processing ....
## DONE
##
## CALL: vus_mar(method = "fi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
## veri_stat = EOC$V, rho_est = rho_out, ci = TRUE)
##
## Estimate of VUS: 0.515
## Standard error: 0.0404
##
## Intervals :
## Level Normal Logit
## 95% ( 0.4357, 0.5942 ) ( 0.4360, 0.5932 )
## Estimation of Standard Error and Intervals are based on Asymptotic Theory
##
## Testing the null hypothesis H0: VUS = 1/6
## Test Statistic P-value
## Normal-test 8.6168 < 2.2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For MSI estimator, we could do that to plot ROC surface
rocs(method = "msi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, rho_est = rho_out, ncp = 30,
ellipsoid = TRUE, cpst = c(-0.56, 2.31))
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate ROC surface using MSI approach
## The diagnostic test: CA125
## The ellipsoidal CR for TCFs are also constructed
## Processing ....
## DONE
## ===============================================================
## Some values of TCFs:
## TCF1 TCF2 TCF3
## (0.4 , 2.945) 0.676 0.504 0.612
## (0.4 , 2.627) 0.676 0.455 0.655
## (0.718 , 2.945) 0.741 0.419 0.612
## (0.718 , 2.627) 0.741 0.370 0.655
## (0.081 , 2.945) 0.544 0.608 0.612
## (0.4 , 1.991) 0.676 0.315 0.769
##
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95
## TCFs at (-0.56, 2.31) are:
## TCF1 TCF2 TCF3
## 0.200 0.609 0.688
## ===============================================================
and compute VUS
vus_mar(method = "msi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, rho_est = rho_out, ci = TRUE)
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate VUS using MSI approach
## The diagnostic test: CA125
## Processing ....
## DONE
##
## CALL: vus_mar(method = "msi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
## veri_stat = EOC$V, rho_est = rho_out, ci = TRUE)
##
## Estimate of VUS: 0.5183
## Standard error: 0.0415
##
## Intervals :
## Level Normal Logit
## 95% ( 0.4368, 0.5997 ) ( 0.4371, 0.5985 )
## Estimation of Standard Error and Intervals are based on Asymptotic Theory
##
## Testing the null hypothesis H0: VUS = 1/6
## Test Statistic P-value
## Normal-test 8.4644 < 2.2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The IPW and SPE estimator require the estimation for verification
probabilities. In order to do that we can employ some regression model
for binary response, i.e. logistic, probit and may be threshold. Here,
we consider the implementation of logistic model. In this package, the
function psglm()
is used to fit the verification model.
pi_out <- psglm(V ~ CA125 + CA153 + Age, data = EOC, model = "logit",
test = TRUE, trace = TRUE)
## Fitting the verification model by using "logit" regression.
## FORMULAR: Verification ~ CA125 + CA153 + Age
##
## =================================================================
## The p-value calculation for the regression coefficients:
## z value Pr(>|z|)
## (Intercept) -2.302816 2.128917e-02
## CA125 4.418434 9.941869e-06
## CA153 3.781993 1.555775e-04
## Age 2.154662 3.118828e-02
## =================================================================
To plot the ROC surface and estimate VUS, we do
rocs(method = "ipw", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, pi_est = pi_out, ncp = 30,
ellipsoid = TRUE, cpst = c(-0.56, 2.31))
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate ROC surface using IPW approach
## The diagnostic test: CA125
## The ellipsoidal CR for TCFs are also constructed
## Processing ....
## DONE
## ===============================================================
## Some values of TCFs:
## TCF1 TCF2 TCF3
## (0.4 , 1.991) 0.618 0.466 0.793
## (0.4 , 2.627) 0.618 0.574 0.677
## (0.081 , 1.991) 0.510 0.565 0.793
## (0.4 , 2.945) 0.618 0.610 0.633
## (0.081 , 2.627) 0.510 0.673 0.677
## (0.4 , 1.672) 0.618 0.429 0.810
##
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95
## TCFs at (-0.56, 2.31) are:
## TCF1 TCF2 TCF3
## 0.205 0.699 0.704
## ===============================================================
vus_mar(method = "ipw", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, pi_est = pi_out, ci = TRUE)
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate VUS using IPW approach
## The diagnostic test: CA125
## Processing ....
## DONE
##
## CALL: vus_mar(method = "ipw", diag_test = EOC$CA125, dise_vec = dise_vec_na,
## veri_stat = EOC$V, pi_est = pi_out, ci = TRUE)
##
## Estimate of VUS: 0.55
## Standard error: 0.0416
##
## Intervals :
## Level Normal Logit
## 95% ( 0.4685, 0.6314 ) ( 0.4679, 0.6294 )
## Estimation of Standard Error and Intervals are based on Asymptotic Theory
##
## Testing the null hypothesis H0: VUS = 1/6
## Test Statistic P-value
## Normal-test 9.2212 < 2.2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For SPE estimator, we type
rocs(method = "spe", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, rho_est = rho_out,
pi_est = pi_out, ncp = 30, ellipsoid = TRUE, cpst = c(-0.56, 2.31))
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate ROC surface using SPE approach
## The diagnostic test: CA125
## The ellipsoidal CR for TCFs are also constructed
## Processing ....
## DONE
## ===============================================================
## Some values of TCFs:
## TCF1 TCF2 TCF3
## (0.4 , 2.627) 0.707 0.516 0.679
## (0.4 , 2.945) 0.707 0.562 0.633
## (0.4 , 1.991) 0.707 0.383 0.799
## (0.081 , 2.627) 0.578 0.629 0.679
## (0.4 , 1.672) 0.707 0.372 0.808
## (0.081 , 2.945) 0.578 0.674 0.633
##
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95
## TCFs at (-0.56, 2.31) are:
## TCF1 TCF2 TCF3
## 0.214 0.656 0.710
## ===============================================================
to provide the plot of ROC surface, and
vus_mar(method = "spe", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, rho_est = rho_out,
pi_est = pi_out, ci = TRUE)
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate VUS using SPE approach
## The diagnostic test: CA125
## Processing ....
## DONE
##
## CALL: vus_mar(method = "spe", diag_test = EOC$CA125, dise_vec = dise_vec_na,
## veri_stat = EOC$V, rho_est = rho_out, pi_est = pi_out, ci = TRUE)
##
## Estimate of VUS: 0.5581
## Standard error: 0.0443
##
## Intervals :
## Level Normal Logit
## 95% ( 0.4712, 0.6450 ) ( 0.4703, 0.6424 )
## Estimation of Standard Error and Intervals are based on Asymptotic Theory
##
## Testing the null hypothesis H0: VUS = 1/6
## Test Statistic P-value
## Normal-test 8.827 < 2.2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
for VUS.
Note that, the asymptotic variances of bias-corrected estimators (FI,
MSI, IPW, SPE) of the VUS presented in previous parts could be obtained
by using the bootstrap process. This is done in a simple way, in fact,
the users only need to use the option boot = TRUE
and
select the number of bootstrap replication n_boot
(default
250). In addition, to save the computation time, the option of parallel
computing could be allowed by parallel = TRUE
. In this
case, the users may design the number of cpus by option
ncpus
, however, if this argument is ignored, then the
defaut (a half of available cpus) will be supplied.
Like the MSI estimator, the KNN approach is based on the estimate of disease probabilities. However, in this framework, we will use K nearest-neighbor estimators.
x_mat <- cbind(EOC$CA125, EOC$CA153, EOC$Age)
rho_1nn <- rho_knn(x_mat, dise_vec = dise_vec_na, veri_stat = EOC$V, k = 1,
type = "mahala")
rocs("knn", diag_test = EOC$CA125, dise_vec_na, veri_stat = EOC$V,
rho_est = rho_1nn, ncp = 30, ellipsoid = TRUE,
cpst = c(-0.56, 2.31))
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate ROC surface using KNN approach
## The diagnostic test: CA125
## The ellipsoidal CR for TCFs are also constructed
## Processing ....
## DONE
## ===============================================================
## Some values of TCFs:
## TCF1 TCF2 TCF3
## (0.4 , 2.945) 0.688 0.486 0.625
## (0.081 , 2.945) 0.555 0.600 0.625
## (0.4 , 2.627) 0.688 0.429 0.663
## (1.354 , 2.945) 0.867 0.286 0.625
## (1.991 , 2.945) 0.945 0.200 0.625
## (0.718 , 2.945) 0.742 0.400 0.625
##
## Some information for Ellipsoidal Confidence Region(s):
## Confidence level: 0.95
## TCFs at (-0.56, 2.31) are:
## TCF1 TCF2 TCF3
## 0.188 0.586 0.688
## ===============================================================
vus_mar(method = "knn", diag_test = EOC$CA125, dise_vec = dise_vec_na,
veri_stat = EOC$V, rho_est = rho_1nn, ci = TRUE,
parallel = TRUE)
## Hmm, look likes the incomplete data
## Number of observation: 278
## 64% of the subjects receive disease verification.
## You required estimate VUS using KNN approach
## The diagnostic test: CA125
## Processing ....
## DONE
##
## CALL: vus_mar(method = "knn", diag_test = EOC$CA125, dise_vec = dise_vec_na,
## veri_stat = EOC$V, rho_est = rho_1nn, ci = TRUE, parallel = TRUE)
##
## Estimate of VUS: 0.5123
## Standard error: 0.0468
##
## Intervals :
## Level Normal Logit
## 95% ( 0.4205, 0.6040 ) ( 0.4212, 0.6025 )
## Estimation of Standard Error and Intervals are based on Bootstrap with 250 replicates
##
## Testing the null hypothesis H0: VUS = 1/6
## Test Statistic P-value
## Normal-test 7.3856 7.583e-14 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
To find the good choice for the number of nearest-neighbor, we use the following command:
cv_knn(x_mat, dise_vec_na, EOC$V, type = "mahala", plot = TRUE)
## [1] 1