xhaz is an R package for fitting excess hazard regression models, with or without the assumption of proportional hazards. In the presence of hierarchical data, inter-cluster heterogeneity can affect excess hazard estimates. To address this issue, the mexhaz R package implements multilevel excess hazard regression models. If the user wants to check the validity of the non-comparability bias, an easy way is to use the mexhazLT function of the xhaz R package. The user can also specify whether the framework is consistent with classical excess hazard modeling, i.e., assuming that the expected mortality of the individuals studied is appropriate, using the pophaz argument is equivalent to classical. The user can also consider a different framework by specifying that pophaz equals rescaled: the expected mortality available in the life table is not accurate, i.e., there is a non-comparability source of bias with respect to the expected mortality of the study population Goungounga et al. (2023).
(Here’s the abstract from Goungounga et al. (2023) paper: In the presence of competing causes of event occurrence (e.g., death), the interest might not only be in the overall survival but also in the so-called net survival, that is, the hypothetical survival that would be observed if the disease under study were the only possible cause of death. Net survival estimation is commonly based on the excess hazard approach in which the hazard rate of individuals is assumed to be the sum of a disease-specific and expected hazard rate, supposed to be correctly approximated by the mortality rates obtained from general population life tables…
A related software package can be found at a gitlab webpage or at https://CRAN.R-project.org/package=xhaz.
The most recent version of xhaz
can be installed
directly from the cran repository using
install.packages("xhaz")
xhaz
depends on the stats
,
survival
, optimParallel
,
numDeriv
, statmod
, gtools
and
splines
packages which can be installed directly from
CRAN.
It also utilizes survexp.fr
, the R package containing
the French life table. For example, to install survexp.fr
follow the instructions available at the RStudio page on R
and survexp.fr.
First, install the R package via github.
devtools::install_github("rstudio/survexp.fr")
Then, when these other packages are installed, please load the xhaz R package.
For this illustration, we used a simulated dataset from the original
paper by Goungounga et al. (2023). This dataset consists of 4,978
patients with information on their treatment arm and clinical center ID,
as this information may affect the excess mortality rate. The US life
table can be used to estimate the model parameters. mexhazLT() function
will be used with argument (`pophaz = classic
).
data("breast", package = "xhaz")
head(breast)
#> temps statut age agecr date SEX armt hosp dept
#> 3 5.050550 1 30.95628 -2.213589 1985-12-23 2 0 3 09
#> 8 1.415929 1 35.20680 -1.788538 1985-04-30 2 1 8 81
#> 11 1.904082 1 31.80505 -2.128713 1985-10-15 2 0 11 06
#> 16 15.000000 0 42.52067 -1.057150 1985-09-16 2 1 16 10
#> 17 2.504911 0 32.01450 -2.107767 1985-01-29 2 0 17 94
#> 18 5.319706 0 31.77941 -2.131276 1985-06-04 2 1 18 55
# The life table to be used is survexp.us. Note that SEX is coded 2 instead of female in survexp.us.
breast$sexe <- "female"
fit.haz <- exphaz(
formula = Surv(temps, statut) ~ 1,
data = breast, ratetable = survexp.us,
only_ehazard = FALSE,
rmap = list(age = 'age', sex = 'sexe',
year = 'date'))
breast$expected <- fit.haz$ehazard
breast$expectedCum <- fit.haz$ehazardInt
qknots <- quantile(breast[breast$statut==1,]$temps, probs=c(1:2/3))
mod.bs <- mexhazLT(
formula = Surv(temps, statut) ~ agecr + armt,
data = breast, ratetable = survexp.us, degree = 3,
knots = qknots, expected = "expected",
expectedCum = "expectedCum",
base = "exp.bs", pophaz = "classic")
#> iteration = 0
#> Step:
#> [1] 0 0 0 0 0 0 0 0
#> Parameter:
#> [1] -1 -1 -1 -1 -1 -1 0 0
#> Function Value
#> [1] 11032
#> Gradient:
#> [1] -1281.60395 -490.49700 -810.60363 -112.90836 72.86473 103.05848
#> [7] -1994.09940 -415.61972
#>
#> iteration = 39
#> Parameter:
#> [1] -1.5083019 0.8413747 0.5236963 -0.2330524 -0.3193139 -1.1142807 0.3759031
#> [8] -0.3239171
#> Function Value
#> [1] 9784.333
#> Gradient:
#> [1] -0.0003943571 0.0013787940 -0.0010131771 -0.0001036824 -0.0007712515
#> [6] 0.0007721412 -0.0000509317 0.0001582521
#>
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#>
#>
#> Data
#> Name N.Obs.Tot N.Obs N.Events N.Clust
#> data 4978 4978 4363 1
#>
#> Details
#> Iter Eval Base Nb.Leg Nb.Aghq Optim Method Code LogLik Total.Time
#> 39 289 exp.bs 20 10 nlm --- 1 -9784.333 2.13
mod.bs
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast,
#> expected = "expected", expectedCum = "expectedCum", pophaz = "classic",
#> base = "exp.bs", degree = 3, knots = qknots, ratetable = survexp.us)
#>
#> Coefficients:
#> Estimate StdErr z.value p.value
#> Intercept -1.508302 0.088648 -17.0145 < 2.2e-16 ***
#> BS3.1 0.841375 0.131143 6.4157 1.402e-10 ***
#> BS3.2 0.523696 0.090615 5.7793 7.499e-09 ***
#> BS3.3 -0.233052 0.188542 -1.2361 0.2164
#> BS3.4 -0.319314 0.255005 -1.2522 0.2105
#> BS3.5 -1.114281 0.248570 -4.4828 7.368e-06 ***
#> agecr 0.375903 0.013603 27.6334 < 2.2e-16 ***
#> armt -0.323917 0.031372 -10.3249 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> log-likelihood: -9784.3328 (for 8 degree(s) of freedom)
#>
#> number of observations: 4978, number of events: 4363
The new parameter to be added to mexhazLT() function is pophaz=“rescaled”: it allows to rescale the life table available for the estimation of the excess hazard parameters. This model concerns that proposed by Goungounga et al (2016).
mod.bs2 <- mexhazLT(
formula = Surv(temps, statut) ~ agecr + armt,
data = breast, ratetable = survexp.us, degree = 3,
knots = qknots, expected = "expected",
expectedCum = "expectedCum",
base = "exp.bs", pophaz = "rescaled")
#> iteration = 0
#> Step:
#> [1] 0 0 0 0 0 0 0 0 0
#> Parameter:
#> [1] -1 -1 -1 -1 -1 -1 0 0 0
#> Function Value
#> [1] 11032
#> Gradient:
#> [1] -1281.60395 -490.49700 -810.60363 -112.90836 72.86473 103.05848
#> [7] -1994.09940 -415.61972 -140.82773
#>
#> iteration = 51
#> Parameter:
#> [1] -1.5349736 0.8585781 0.5389037 -0.2782919 -0.3520908 -1.3452699 0.3591276
#> [8] -0.3373832 0.8067923
#> Function Value
#> [1] 9783.523
#> Gradient:
#> [1] -0.0025525541 -0.0004893081 -0.0006475602 -0.0002819434 -0.0002892193
#> [6] -0.0005962925 -0.0007585186 -0.0011386874 -0.0002582965
#>
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#>
#>
#> Data
#> Name N.Obs.Tot N.Obs N.Events N.Clust
#> data 4978 4978 4363 1
#>
#> Details
#> Iter Eval Base Nb.Leg Nb.Aghq Optim Method Code LogLik Total.Time
#> 51 401 exp.bs 20 10 nlm --- 1 -9783.523 2.93
mod.bs2
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast,
#> expected = "expected", expectedCum = "expectedCum", pophaz = "rescaled",
#> base = "exp.bs", degree = 3, knots = qknots, ratetable = survexp.us)
#>
#> Coefficients:
#> Estimate StdErr z.value p.value
#> Intercept -1.534974 0.094208 -16.2935 < 2.2e-16 ***
#> BS3.1 0.858578 0.136119 6.3076 2.835e-10 ***
#> BS3.2 0.538904 0.094530 5.7009 1.192e-08 ***
#> BS3.3 -0.278292 0.202270 -1.3758 0.16887
#> BS3.4 -0.352091 0.281798 -1.2494 0.21150
#> BS3.5 -1.345270 0.342630 -3.9263 8.626e-05 ***
#> agecr 0.359128 0.019167 18.7363 < 2.2e-16 ***
#> armt -0.337383 0.034268 -9.8455 < 2.2e-16 ***
#> log(Alpha) 0.806792 0.434871 1.8552 0.06356 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> log-likelihood: -9783.523 (for 9 degree(s) of freedom)
#>
#> number of observations: 4978, number of events: 4363
As in mexhaz, the new parameter to be added to the mexhazLT() function is random=“hosp”: it allows to specify the variable indicating the cluster levels. However, pophaz is set to be equal to “classic”. This excess hazard regression model is the one proposed by Charvat et al (2016). In the output, loglik corresponds to the total log-likelihood including the sum of the cumulative expected hazards, which is often removed in classical excess hazard regression models because it is considered a nuisance parameter.
mod.bs3 <- mexhazLT(
formula = Surv(temps, statut) ~ agecr + armt,
data = breast, ratetable = survexp.us, degree = 3,
knots = qknots, expected = "expected",
expectedCum = "expectedCum",
base = "exp.bs", pophaz = "classic", random = "hosp")
#> iteration = 0
#> Step:
#> [1] 0 0 0 0 0 0 0 0 0
#> Parameter:
#> [1] -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 0.0 0.0 0.1
#> Function Value
#> [1] 10129.43
#> Gradient:
#> [1] -39.69633 -148.95592 -395.16138 18.72774 118.58665 119.00834
#> [7] -2896.28313 329.60901 39.47875
#>
#> iteration = 53
#> Parameter:
#> [1] -1.90783127 1.00354112 1.11855695 0.90280950 0.65681302 0.05890742
#> [7] 0.49124655 -0.45036907 -0.20709299
#> Function Value
#> [1] 8998.729
#> Gradient:
#> [1] 0.0026219409 0.0022965273 0.0006163271 0.0006457412 -0.0012478267
#> [6] 0.0002801244 -0.0007330527 -0.0002019078 0.0019390427
#>
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#>
#> Computation of the covariance matrix of the shrinkage estimators
#>
#> Data
#> Name N.Obs.Tot N.Obs N.Events N.Clust
#> data 4978 4978 4363 100
#>
#> Details
#> Iter Eval Base Nb.Leg Nb.Aghq Optim Method Code LogLik Total.Time
#> 53 422 exp.bs 20 10 nlm --- 1 -8998.729 17.36
mod.bs3
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast,
#> expected = "expected", expectedCum = "expectedCum", pophaz = "classic",
#> base = "exp.bs", degree = 3, knots = qknots, random = "hosp",
#> ratetable = survexp.us)
#>
#> Coefficients:
#> Estimate StdErr z.value p.value
#> Intercept -1.907831 0.122002 -15.6376 < 2.2e-16 ***
#> BS3.1 1.003541 0.131641 7.6233 2.465e-14 ***
#> BS3.2 1.118557 0.093232 11.9976 < 2.2e-16 ***
#> BS3.3 0.902810 0.192646 4.6864 2.781e-06 ***
#> BS3.4 0.656813 0.257079 2.5549 0.010622 *
#> BS3.5 0.058907 0.251562 0.2342 0.814855
#> agecr 0.491247 0.014198 34.5990 < 2.2e-16 ***
#> armt -0.450369 0.032262 -13.9596 < 2.2e-16 ***
#> hosp [log(sd)] -0.207093 0.074504 -2.7796 0.005442 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> log-likelihood: -8998.7292 (for 9 degree(s) of freedom)
#>
#> number of observations: 4978, number of events: 4363
As in mexhaz, the new parameter to be added to the mexhazLT()
function is random=“hosp”: it allows to specify the variable indicating
the cluster levels. However, pophaz is set to be equal to “rescaled”
(`pophaz = "rescaled"
). This excess hazard regression model
is the one proposed by Goungounga et al (2023).
mod.bs4 <- mexhazLT(
formula = Surv(temps, statut) ~ agecr + armt,
data = breast, ratetable = survexp.us, degree = 3,
knots = qknots, expected = "expected",
expectedCum = "expectedCum",
base = "exp.bs", pophaz = "rescaled", random = "hosp")
#> iteration = 0
#> Step:
#> [1] 0 0 0 0 0 0 0 0 0 0
#> Parameter:
#> [1] -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 0.0 0.0 0.0 0.1
#> Function Value
#> [1] 10129.43
#> Gradient:
#> [1] -39.69633 -148.95592 -395.16138 18.72774 118.58665 119.00834
#> [7] -2896.28313 329.60901 -83.32170 39.47875
#>
#> iteration = 63
#> Parameter:
#> [1] -1.9735698 1.0336144 1.1548583 0.8953165 0.6287633 -0.1876885
#> [7] 0.4726014 -0.4743774 1.0226142 -0.1557342
#> Function Value
#> [1] 8995.933
#> Gradient:
#> [1] 3.686699e-06 1.091097e-04 5.260753e-04 3.165042e-04 -5.493348e-04
#> [6] 2.619345e-04 1.091394e-04 -7.566996e-04 -4.126733e-04 -7.275958e-05
#>
#> Gradient relatif proche de zéro.
#> L'itération courante est probablement la solution.
#>
#> Computation of the covariance matrix of the shrinkage estimators
#>
#> Data
#> Name N.Obs.Tot N.Obs N.Events N.Clust
#> data 4978 4978 4363 100
#>
#> Details
#> Iter Eval Base Nb.Leg Nb.Aghq Optim Method Code LogLik Total.Time
#> 63 553 exp.bs 20 10 nlm --- 1 -8995.933 22.91
mod.bs4
#> Call:
#> mexhazLT(formula = Surv(temps, statut) ~ agecr + armt, data = breast,
#> expected = "expected", expectedCum = "expectedCum", pophaz = "rescaled",
#> base = "exp.bs", degree = 3, knots = qknots, random = "hosp",
#> ratetable = survexp.us)
#>
#> Coefficients:
#> Estimate StdErr z.value p.value
#> Intercept -1.973570 0.131548 -15.0027 < 2.2e-16 ***
#> BS3.1 1.033614 0.138130 7.4829 7.261e-14 ***
#> BS3.2 1.154858 0.098864 11.6813 < 2.2e-16 ***
#> BS3.3 0.895317 0.207525 4.3143 1.601e-05 ***
#> BS3.4 0.628763 0.291275 2.1587 0.0308767 *
#> BS3.5 -0.187689 0.317852 -0.5905 0.5548619
#> agecr 0.472601 0.016678 28.3364 < 2.2e-16 ***
#> armt -0.474377 0.035524 -13.3537 < 2.2e-16 ***
#> log(Alpha) 1.022614 0.273560 3.7382 0.0001854 ***
#> hosp [log(sd)] -0.155734 0.078446 -1.9853 0.0471164 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> log-likelihood: -8995.9329 (for 10 degree(s) of freedom)
#>
#> number of observations: 4978, number of events: 4363
We can compare the output of these two models using AIC or BIC criteria.
compared_models <- list(mod.bs,mod.bs2, mod.bs3, mod.bs4)
names(compared_models) <- c("mod.bs","mod.bs2", "mod.bs3", "mod.bs4")
sapply(compared_models, function(i) {
AIC(i)
})
#> mod.bs mod.bs2 mod.bs3 mod.bs4
#> 19584.67 19585.05 18015.46 18011.87
A statistical comparison between two nested models can be performed with a likelihood ratio test calculated by function anova method implemented in xhaz.
For example, suppose we want to test whether we can drop the rescaling parameter between the different excess hazard regression models.
As in survival package, we compare the models using anova(), i.e.,
anova(mod.bs,mod.bs2)
#> Assumption: Model 1 nested within Model 2
#>
#> Likelihood ratio test
#> Model 1:
#> Surv(temps, statut) ~ agecr + armt
#> Model 2:
#> Surv(temps, statut) ~ agecr + armt
#> Model.df loglik df Chisq Pr(>Chisq)
#> [1,] 8.0 -9784.3 NA NA NA
#> [2,] 9.0 -9783.5 1.0 0.81 0.2031
anova(mod.bs3,mod.bs4)
#> Assumption: Model 1 nested within Model 2
#>
#> Likelihood ratio test
#> Model 1:
#> Surv(temps, statut) ~ agecr + armt
#> Model 2:
#> Surv(temps, statut) ~ agecr + armt
#> Model.df loglik df Chisq Pr(>Chisq)
#> [1,] 9.0 -8998.7 NA NA NA
#> [2,] 10.0 -8995.9 1.0 2.796 0.01804 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that it is the user’s responsibility to provide properly nested hazard models for the LRT to be valid. The results suggest that by correcting for between-cluster heterogeneity and non-comparability bias with a rescaled random effects excess hazard regression model, the user will be able to provide more accurate estimates of net survival.
One could be interested in the prediction of net survival and excess hazard for the individual with the same characteristics as individual 1 in the breast dataset (age 30.95, armt equals 0) as performed in mexhaz.
predict_mod <- predict(mod.bs,
time.pts=seq(0.1,10,by=0.1),
data.val=data.frame(agecr = breast[1,]$agecr,
armt = breast[1,]$armt))
predict_mod2 <- predict(mod.bs2,
time.pts=seq(0.1,10,by=0.1),
data.val=data.frame(agecr = breast[1,]$agecr,
armt = breast[1,]$armt))
predict_mod3 <- predict(mod.bs3,
time.pts=seq(0.1,10,by=0.1),
data.val=data.frame(agecr = breast[1,]$agecr,
armt = breast[1,]$armt,
hosp = breast[1,]$hosp))
predict_mod4 <- predict(mod.bs4,
time.pts=seq(0.1,10,by=0.1),
data.val=data.frame(agecr = breast[1,]$agecr,
armt = breast[1,]$armt,
hosp = breast[1,]$hosp))
old.par <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
plot(predict_mod$results$time.pts, predict_mod$results$hazard,
type = "l", lwd = 2, xlab = 'Time (years)',
ylab = "excess hazard")
lines(predict_mod2$results$time.pts, predict_mod2$results$hazard,
type = "l", lwd = 2, col = "blue", lty = 2)
lines(predict_mod3$results$time.pts, predict_mod3$results$hazard,
type = "l", lwd = 2, col = "red")
lines(predict_mod4$results$time.pts, predict_mod4$results$hazard,
type = "l", lwd = 2, col = "green", lty = 2)
plot(predict_mod$results$time.pts, predict_mod$results$surv,
type = "l", lwd = 2, xlab = 'Time (years)',
ylab = "Net survival", ylim = c(0,1))
lines(predict_mod2$results$time.pts, predict_mod2$results$surv,
type = "l", lwd = 2, col = "blue", lty = 2)
lines(predict_mod3$results$time.pts, predict_mod3$results$surv,
type = "l", lwd = 2, col = "red")
lines(predict_mod4$results$time.pts, predict_mod4$results$surv,
type = "l", lwd = 2, col = "green",lty = 2)
legend(
"topright",
legend = c("mod.bs",
"mod.bs2",
"mod.bs3",
"mod.bs4"),
lty = c(1, 2 , 1, 2),
lwd = c(2, 2 , 2, 2),
col = c("black", "blue", "red", "green")
)
GPL 3.0, for academic use.
We are grateful to the members of the CENSUR Survival Group for their helpful comments.