library(curesurv)
#> Le chargement a nécessité le package : stringr
#> Le chargement a nécessité le package : survival
When the cause of death is unknown, the most common method to estimate the cancer-related survival is net survival. Its estimation assumes that the observed hazard \(\lambda_{obs}\) is equal to the sum of the known background mortality hazard in the general population \(\lambda_{pop}\) (obtained from national Statistic Institutes such as INSEE in France) and the excess hazard (due to cancer) \(\lambda_{exc}\). For one individual \(i\), this relation can be expressed as:
\[ \lambda_{obs}(t_i|z_i) = \lambda_{pop}(t_i+a_i|z_{pop,i}) + \lambda_{exc}(t_i|z_i) \] where \(Z_{pop}\subset Z\) are the variables of the mortality table and of the model respectively.
The cumulative observed hazard can be written as: \[ \Lambda_{obs}(t|z) = \Lambda_{pop}(t+a|z) + \Lambda_{exc}(t|z) \] and the net survival is obtained as following: \[ S_{n}(t|z) = \exp(-\Lambda_{exc}(t|z)) \]
The TNEH model is a relatively recent excess hazard model developed by Boussari et al. \(\\\)
The particularity of this model is that it enables the estimation, at the same time as the classical parameters of a model of the excess rate, of a quantity which is obtained by post-estimation by the classical models: it concerns the time after which the excess rate becomes null i.e. the cure point.
The excess hazard proposed can be expressed as following:
\[ \lambda_{exc}(t|z;\theta) = \left(\dfrac{t}{\tau(z;\tau*)}\right)^{\alpha(z;\alpha*)-1} \left(1 - \dfrac{t}{\tau(z;\tau*)}\right)^{\beta-1} 1_{\left\{0 \le t \le \tau(z;\tau*)\right\}} \]
where : \(\\\)
\(\tau(z;\tau*) > 0\) is the time to cure, depends on covariates z and vector of parameters \(\tau*\). It corresponds to the vector of parameters fitting the time-to-null excess hazard. \(\\\)
\(\alpha(z;\alpha*) > 0\) and \(\beta > 1\) are shape parameters. With \(\beta>1\), the excess hazard is forced to be null and continuous in \(\tau(z;\tau*)\). \(\\\)
The vector of parameters to be estimated is \(\theta = (\alpha*, \beta, \tau*)\) with \(\alpha(z;\alpha*) > 0\) .
\[ \Lambda_{exc}(t|z;\theta) = \tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right) F_{Be} \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*) , \beta \right) \]
where
B is the beta function \(\\\) \(F_{Be}\) is the cumulative distribution function (cdf) of the beta distribution
\[ S_n(t|z) = \exp(-\Lambda_{exc}(t|z)) = \exp\left(-\tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right) F_{Be} \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*),\beta \right)\right) \]
The cure fraction corresponds to the net survival at \(t = \tau\) in TNEH model. It can be expressed as:
\[ \pi(z|\theta) = \exp\left(-\Lambda_{exc}(\tau(z;\tau*)|z)\right) = \exp\left(-\tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right)\right) \]
This quantity corresponds to the probability P(t) of being cured at a given time t after diagnosis knowing that he/she was alive up to time t. It can be expressed as following:
\[ P(t|z) = \dfrac{\pi(z|\theta)}{S_n(t|z)} = \exp \left( \tau(z;\tau*) \left( B \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*) , \beta \right) - B(\alpha, \beta) \right) \right) \] To calculates the confidence intervals of \(P(t|z)\), can be obtained using the delta method. The application of this method requires the partial derivatives of \(P(t|z)\) with respect of the parameters of the model. This can be written as:
\[ \dfrac{\partial P(t|z)}{\partial \theta} = \dfrac{1}{S_n(t|z)^2} \left( \dfrac{\partial \pi(z|\theta)}{\partial \theta} S_n(t|z) - \dfrac{\partial S_n(t|z)}{\partial \theta} \pi(z|\theta) \right) \]
There are no covariables acting on parameters \(\tau\) (\(\tau=\tau_0\)) and \(\alpha\) (\(\alpha=\alpha_0\))
fit_ad_tneh_nocov <- curesurv(Surv(time_obs, event) ~ 1,
pophaz = "ehazard",
cumpophaz = "cumehazard",
model = "nmixture", dist = "tneh",
link_tau = "linear",
data = testiscancer,
method_opt = "L-BFGS-B")
summary(fit_ad_tneh_nocov)
#> Call:
#> curesurv(formula = Surv(time_obs, event) ~ 1, data = testiscancer,
#> pophaz = "ehazard", cumpophaz = "cumehazard", model = "nmixture",
#> dist = "tneh", link_tau = "linear", method_opt = "L-BFGS-B")
#>
#> coef se(coef) z p
#> alpha0 2.1841 0.1032 21.166 <2e-16
#> beta 4.4413 0.5178 8.577 <2e-16
#> tau0 5.1018 0.5397 9.452 <2e-16
#>
#> Estimates and their 95% CI after back-transformation
#> estimates LCI UCI
#> alpha0 2.184 1.982 2.386
#> beta 4.441 3.426 5.456
#> tau0 5.102 4.044 6.160
#>
#> Cured proportion exp[-ζ0* B((α0+α*Z)β)]
#> For the reference individual (each Z at 0)
#> [1] 0.8474
#>
#> log-likelihood: -2633.903 (for 3 degree(s) of freedom)
#> AIC: 5273.806
#>
#> n= 2000 , number of events= 949
We can see that the time-to-cure \(\tau=\tau_0\) is estimated at 5.102 years, and the cure proportion is estimated at 84.74%.
We predict for varying time after diagnosis
newdata1 <- with(testiscancer,
expand.grid(event = 0, time_obs = seq(0.1,10,0.1)))
p_28 <- predict(object = fit_ad_tneh_nocov, newdata = newdata1)
plot(p_28)
The confidence intervals at \(1-\alpha\) level for the cure fraction \(\pi\) can be written as:
\[ \left[\hat{\pi} \pm z_{1 - \alpha / 2} \sqrt{Var(\hat{\hat{\pi}})}\right] \] where \[ Var(\hat{\pi}) = \dfrac{\partial \hat{\pi}}{\partial \theta} Var(\theta) \left(\dfrac{\partial \hat{\pi}}{\partial \theta}\right)^T \]
p_28[1,c("lower_bound_pi_tneh","cured","upper_bound_pi_tneh")]
#> lower_bound_pi_tneh cured upper_bound_pi_tneh
#> 1 0.8253941 0.8473745 0.8693549
We search the time \(\text{t}=\text{TTC}_i\) from which \(\text{P}_i(t) = 1-\epsilon\). \(\epsilon\) can be fixed to 0.95.
The variance formula can be expressed as:
\[ Var(TTC) = Var(g(\theta;z_i)) \simeq \left(\dfrac{\partial P(t|z_i;\theta)}{\partial t}_{|t = TTC}\right)^{-2} Var(P(t|z_i;\theta))_{|t=TTC} \]
p_28[1,c("lower_bound_TTC_tneh","time_to_cure_ttc","upper_bound_TTC_tneh")]
#> lower_bound_TTC_tneh time_to_cure_ttc upper_bound_TTC_tneh
#> 1 1.825476 2.083375 2.341273
We create a new age variable : age_crmin the reduced age and “centered” around the age of the youngest patient
testiscancer$age_crmin <- (testiscancer$age- min(testiscancer$age)) /
sd(testiscancer$age)
This model has the variable age_crmin acting on both \(\alpha\) and \(\tau\) : (\(\alpha=\alpha_0+Z_{age\_crmin}\times\alpha_1\) et \(\tau=\tau_0+Z_{age\_crmin}\times\tau_1\))
fit_m1_ad_tneh <- curesurv(Surv(time_obs, event) ~ z_tau(age_crmin) +
z_alpha(age_crmin),
pophaz = "ehazard",
cumpophaz = "cumehazard",
model = "nmixture", dist = "tneh",
link_tau = "linear",
data = testiscancer,
method_opt = "L-BFGS-B")
summary(fit_m1_ad_tneh)
#> Call:
#> curesurv(formula = Surv(time_obs, event) ~ z_tau(age_crmin) +
#> z_alpha(age_crmin), data = testiscancer, pophaz = "ehazard",
#> cumpophaz = "cumehazard", model = "nmixture", dist = "tneh",
#> link_tau = "linear", method_opt = "L-BFGS-B")
#>
#> coef se(coef) z p
#> alpha0 2.87737 0.24078 11.950 < 2e-16
#> alpha_1_(age_crmin) -0.50128 0.07498 -6.685 2.30e-11
#> beta 5.15005 1.04335 4.936 7.97e-07
#> tau0 3.25831 0.55340 5.888 3.91e-09
#> tau_1_(age_crmin) 3.46300 1.23871 2.796 0.00518
#>
#> Estimates and their 95% CI after back-transformation
#> estimates LCI UCI
#> alpha0 2.877 2.405 3.349
#> alpha_1_(age_crmin) 2.376 2.229 2.523
#> beta 5.150 3.105 7.195
#> tau0 3.258 2.174 4.343
#> tau_1_(age_crmin) 6.721 4.293 9.149
#>
#> Cured proportion exp[-(ζ0+ζ*Z)* B((α0+α*Z)β)] and its 95% CI
#> For the reference individual (each Z at 0)
#> [1] 0.9675
#>
#> log-likelihood: -2544.1 (for 5 degree(s) of freedom)
#> AIC: 5098.2
#>
#> n= 2000 , number of events= 949
For the reference individual (that is the patient with age_crmin=0, so the youngest patient, approximately 20y old at diagnosis), the cure proportion is estimated at 96,75% and the time to null excess hazard at 3.258 year. For an individual whose age_ is_crmin is 1 (that is they are aged the standard deviation more than the youngest, that is approximately 39y old at diagnosis), the time to null excess hazard is 6.721 year.
With varying time for patient of mean age
#time varying prediction for patient with age mean
newdata1 <- with(testiscancer,
expand.grid(event = 0,
age_crmin = mean(age_crmin),
time_obs = seq(0.001,10,0.1)))
pred_agemean <- predict(object = fit_m1_ad_tneh, newdata = newdata1)
With varying time for oldest patient
#time varying prediction for patient with age max
newdata2 <- with(testiscancer,
expand.grid(event = 0,
age_crmin = max(age_crmin),
time_obs = seq(0.001,10,0.1)))
pred_agemax <- predict(object = fit_m1_ad_tneh, newdata = newdata2)
At 2 years after diagnostic for patients of different ages
# predictions at time 2 years with varying age
newdata3 <- with(testiscancer,
expand.grid(event = 0,
age_crmin = seq(min(testiscancer$age_crmin),
max(testiscancer$age_crmin), 0.1),
time_obs = 2))
pred_age_val <- predict(object = fit_m1_ad_tneh, newdata = newdata3)
val_age <- seq(min(testiscancer$age_crmin),
max(testiscancer$age_crmin),
0.1) * sd(testiscancer$age) + min(testiscancer$age)
par(mfrow = c(2, 2),cex = 1.0)
plot(pred_agemax$time,pred_agemax$ex_haz,type = "l",lty = 1,lwd = 2,
xlab = "Time since diagnosis", ylab = "excess hazard")
lines(pred_agemean$time,pred_agemean$ex_haz,type = "l",lty = 2,lwd = 2)
legend("topright",horiz = FALSE,
legend = c("hE(t) age.max = 79.9", "hE(t) age.mean = 50.8"),
col = c("black", "black"),lty = c(1, 2, 1, 1, 2, 2))
grid()
plot(pred_agemax$time,pred_agemax$netsurv,type = "l",lty = 1,lwd = 2,
ylim = c(0, 1),xlab = "Time since diagnosis",ylab = "net survival")
lines(pred_agemean$time,pred_agemean$netsurv,type = "l",lty = 2,lwd = 2)
legend("bottomleft",horiz = FALSE,
legend = c("Sn(t) age.max = 79.9", "Sn(t) age.mean = 50.8"),
col = c("black", "black"),lty = c(1, 2, 1, 1, 2, 2))
grid()
plot(pred_agemax$time,pred_agemax$pt_cure,type = "l",lty = 1,lwd = 2,ylim = c(0, 1),
xlab = "Time since diagnosis",ylab = "probability of being cured P(t)")
lines(pred_agemean$time,pred_agemean$pt_cure,type = "l",lty = 2,lwd = 2)
abline(v = pred_agemean$tau[1],lty = 2,lwd = 2,col = "blue")
abline(v = pred_agemean$time_to_cure_ttc[1],lty = 2,lwd = 2,col = "red")
abline(v = pred_agemax$tau[1],lty = 1,lwd = 2,col = "blue")
abline(v = pred_agemax$time_to_cure_ttc[1],lty = 1,lwd = 2,col = "red")
grid()
legend("bottomright",horiz = FALSE,
legend = c("P(t) age.max = 79.9","P(t) age.mean = 50.8",
"TNEH age.max = 79.9","TTC age.max = 79.9",
"TNEH age.mean = 50.8","TTC age.mean = 50.8"),
col = c("black", "black", "blue", "red", "blue", "red"),
lty = c(1, 2, 1, 1, 2, 2))
par(oldpar)
par(mfrow=c(2,2))
plot(val_age,pred_age_val$ex_haz,
type = "l",lty=1, lwd=2,
xlab = "age",ylab = "excess hazard 2y after diagnosis")
grid()
plot(val_age,pred_age_val$netsurv,
type = "l", lty=1,lwd=2,ylim=c(0,1),
xlab = "age", ylab = "net survival 2y after diagnosis")
grid()
plot(val_age,pred_age_val$pt_cure, type = "l", lty=1, lwd=2,ylim=c(0,1),
xlab = "age",ylab = "P(t) 2y after diagnosis")
grid()
plot(val_age,pred_age_val$cured, type = "l", lty=1, lwd=2,ylim=c(0,1),
xlab = "age", ylab = "cure proportion")
grid()
par(oldpar)
Effects of centered and reduced age on \(\tau\) (\(tau=\tau_0+Z_{age\_cr}\times\tau_1\), \(\alpha=\alpha_0\))
#| echo: true
#| label: withtauonly
#| warning: false
#| message: false
fit_ad_tneh_covtau <- curesurv(
Surv(time_obs, event) ~ z_tau(age_cr),
pophaz = "ehazard",
cumpophaz = "cumehazard",
model = "nmixture",
dist = "tneh",
link_tau = "linear",
data = testiscancer,
method_opt = "L-BFGS-B"
)
#> Warning in diag(varcov_star): NAs introduits lors de la conversion automatique
#> Warning in diag(varcov): NAs introduits lors de la conversion automatique
#> Warning in diag(varcov_star): NAs introduits lors de la conversion automatique
#> Warning in diag(varcov): NAs introduits lors de la conversion automatique
#> Warning in par(oldpar): appel de par(new=TRUE) sans graphe
#> Warning in diag(varcov_star): NAs introduits lors de la conversion automatique
#> Warning in diag(varcov): NAs introduits lors de la conversion automatique
#> Warning in par(oldpar): appel de par(new=TRUE) sans graphe
#> Warning in diag(varcov_star): NAs introduits lors de la conversion automatique
#> Warning in diag(varcov): NAs introduits lors de la conversion automatique
#> Warning in par(oldpar): appel de par(new=TRUE) sans graphe
#> Warning in par(oldpar): appel de par(new=TRUE) sans graphe
summary(fit_ad_tneh_covtau)
#> Call:
#> curesurv(formula = Surv(time_obs, event) ~ z_tau(age_cr), data = testiscancer,
#> pophaz = "ehazard", cumpophaz = "cumehazard", model = "nmixture",
#> dist = "tneh", link_tau = "linear", method_opt = "L-BFGS-B")
#>
#> coef se(coef) z p
#> alpha0 1.9753 0.1299 15.206 < 2e-16
#> beta 5.3066 1.1286 4.702 2.58e-06
#> tau0 7.4380 1.8109 4.107 4.00e-05
#> tau_1_(age_cr) 2.3159 0.8529 2.715 0.00662
#>
#> Estimates and their 95% CI after back-transformation
#> estimates LCI UCI
#> alpha0 1.975 1.721 2.230
#> beta 5.307 3.095 7.519
#> tau0 7.438 3.889 10.987
#> tau_1_(age_cr) 9.754 8.082 11.426
#>
#> Cured proportion exp[-(ζ0+ζ*Z)* B((α0+α*Z)β)] and its 95% CI
#> For the reference individual (each Z at 0)
#> [1] 0.794
#>
#> log-likelihood: -2610.768 (for 4 degree(s) of freedom)
#> AIC: 5229.537
#>
#> n= 2000 , number of events= 949
This model estimates a cure proportion of 79.4% for individuals of mean age (51.05 year old at diagnosis) with a time to null excess hazard 7.44 year. For an individual of age at diagnosis 70.01 (age_cr=1), the time to null excess hazard is estimated at 9.754. This model estimates that the time to null excess hazard increases by 2.3159 when age_cr increases by (that is, when age at diagnosis increases by 18.96)
summary(testiscancer$age_cr)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.6362 -0.9358 0.0276 0.0000 0.9525 1.5240
summary(testiscancer$age)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 20.02 33.30 51.57 51.05 69.11 79.95
newdata2 <- with(testiscancer,
expand.grid(event = 0,
time_obs = seq(0.1, 10, 0.1),
age_cr = c(-0.9358, 0.0276, 0.9525) ))
newdata2_1stqu <- newdata2[newdata2$age_cr==-0.9358,]
newdata2_2rdqu <- newdata2[newdata2$age_cr==0.0276,]
newdata2_3rdqu <- newdata2[newdata2$age_cr==0.9525,]
p1stqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_1stqu)
p2rdqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_2rdqu)
p3rdqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_3rdqu)
par(mfrow=c(2,2))
plot(p1stqu,
main = "Excess hazard for age 33.3",
fun = "haz")
plot(p2rdqu,
fun = "haz",
main = "Excess hazard for age 51.57")
plot(p3rdqu,
fun = "haz",
main = "Excess hazard for age 69.11")
par(oldpar)
Effect of age_cr on only \(\alpha\) (\(\alpha=\alpha_0+Z_{age\_cr}*\alpha_1\), \(\tau=\tau_0\))
#| echo: true
#| label: only_covariate_on_alpha
#| message: false
#| warning: false
fit_ad_tneh_covalpha <-
curesurv(
Surv(time_obs, event) ~ z_alpha(age_cr),
pophaz = "ehazard",
cumpophaz = "cumehazard",
model = "nmixture",
dist = "tneh",
link_tau = "linear",
data = testiscancer,
method_opt = "L-BFGS-B"
)
summary(fit_ad_tneh_covalpha)
#> Call:
#> curesurv(formula = Surv(time_obs, event) ~ z_alpha(age_cr), data = testiscancer,
#> pophaz = "ehazard", cumpophaz = "cumehazard", model = "nmixture",
#> dist = "tneh", link_tau = "linear", method_opt = "L-BFGS-B")
#>
#> coef se(coef) z p
#> alpha0 2.06862 0.11152 18.550 < 2e-16
#> alpha_1_(age_cr) -0.46785 0.06331 -7.389 1.48e-13
#> beta 4.77703 0.81573 5.856 4.74e-09
#> tau0 6.09881 1.11797 5.455 4.89e-08
#>
#> Estimates and their 95% CI after back-transformation
#> estimates LCI UCI
#> alpha0 2.069 1.850 2.287
#> alpha_1_(age_cr) 1.601 1.477 1.725
#> beta 4.777 3.178 6.376
#> tau0 6.099 3.908 8.290
#>
#> Cured proportion exp[-ζ0* B((α0+α*Z)β)]
#> For the reference individual (each Z at 0)
#> [1] 0.8181
#>
#> log-likelihood: -2586.138 (for 4 degree(s) of freedom)
#> AIC: 5180.275
#>
#> n= 2000 , number of events= 949
The time to null excess hazard is estimated at 6.099 year for all individuals, and the cure proportion is estimated at 81.81% for patient of mean age (51.05 yo at diagnosis)
p4_33.3 <- predict(object = fit_ad_tneh_covalpha,
newdata = newdata2_1stqu)
p4_51.6 <- predict(object = fit_ad_tneh_covalpha,
newdata = newdata2_2rdqu)
p4_69.1 <- predict(object = fit_ad_tneh_covalpha,
newdata = newdata2_3rdqu)
par(mfrow=c(2,2))
plot(p4_33.3,
main = "Pt_cure for age 33.3",
fun = "pt_cure")
plot(p4_51.6,
fun = "pt_cure",
main = "Pt_cure for age 51.6")
plot(p4_69.1,
fun = "pt_cure",
main = "Pt_cure for age 69.1")
par(oldpar)
We fitted 4 models and to compare them we can use the AIC criteria
AIC(fit_ad_tneh_nocov,fit_ad_tneh_covalpha,fit_ad_tneh_covtau,fit_m1_ad_tneh)
#> fit_ad_tneh_nocov fit_ad_tneh_covalpha fit_ad_tneh_covtau fit_m1_ad_tneh
#> 1 5273.806 5180.275 5229.537 5098.2
The best model is the one with the lowest AIC : it’s the one with an effect of age on both $ $ and \(\alpha\).
date()
#> [1] "Tue Sep 10 15:06:51 2024"
sessionInfo()
#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=French_France.utf8
#> [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
#> [5] LC_TIME=French_France.utf8
#>
#> time zone: Europe/Paris
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] curesurv_0.1.1 survival_3.5-8 stringr_1.5.1
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.2 knitr_1.45
#> [4] rlang_1.1.3 xfun_0.42 Formula_1.2-5
#> [7] highr_0.10 stringi_1.8.3 jsonlite_1.8.8
#> [10] glue_1.7.0 htmltools_0.5.7 sass_0.4.9
#> [13] rmarkdown_2.26 grid_4.3.3 evaluate_0.23
#> [16] jquerylib_0.1.4 fastmap_1.1.1 numDeriv_2016.8-1.1
#> [19] yaml_2.3.8 lifecycle_1.0.4 compiler_4.3.3
#> [22] rngWELL_0.10-9 randtoolbox_2.0.4 rstudioapi_0.16.0
#> [25] lattice_0.22-5 digest_0.6.35 R6_2.5.1
#> [28] splines_4.3.3 magrittr_2.0.3 bslib_0.6.2
#> [31] Matrix_1.6-5 tools_4.3.3 cachem_1.0.8