Introduction

The interactionRCS package is designed to facilitate interpretation and presentation of results from a regression model (linear, logistic, Cox) where an interaction between the main predictor of interest \(X\) (binary or continuous) and another continuous covariate \(Z\) has been specified. Specifically, the package will provide point estimates of the main effect of \(X\) over levels of \(Z\), allowing for settings where \(Z\) is flexibly modeled with restricted cubic splines, and provide a graphical display of this interaction. Two methods for deriving and plotting confidence intervals are also implemented, including the delta method and bootstrap.

How to use the interactionRCS package

Functions within the interactionRCS package require that a regression model has already been estimated and model results be provided as an object. Models with linear and log-linear interactions can be run with standard function (i.e. glm for linear and logistic, coxph for survival). To model interactions with restricted cubic splines, however, interactionRCS requires models to be run with more flexible functions from the rms package (specifically Glm, lrm, and cph).

The main function of interactionRCS is intEST, which provides point estimates and confidence intervals for the effect of \(X\) over levels of \(Z\) when an interaction is included in the model, either as a (log-)linear term or flexibly modeled with rcs. The following options must be specified:

  • model: the model previously run (Glm,lrm, cph, or glm, coxph from models without splines)
  • var1: the name of the main predictor of interest (\(X\))
  • var2: the name of the continuous predictor interacting with var1 (\(Z\))
  • var2values: the values of \(var2\) for which the HR of \(var1\) should be calculated

Additional options include:

  • data: the same dataset used for fitting the model (only used for bootstrap CIs). If data=NULL, we will search for model$x
  • ci (default TRUE) : whether a confidence interval for each effect estimate should also be provided
  • ci.method: either "delta" or "bootstrap". Default "delta"
  • ci.boot.method (default= “percentile” - only if method="bootstrap") : see boot.ci type parameter
  • R (default=100 - only if method="bootstrap") : number of bootstrap iterations
  • parallel (default “multicore” - only if method="bootstrap"): see boot.ci reference

An additional function plotINT is also implemented to provide a graphical display of the results and only require the rcs-, loglin-, or lin- results as object.

Example 1

The first example is based on a study on drug relapse among 575 patients enrolled in a clinical trial of residential treatment for drug abuse. The main exposure of interest is the binary indicator of assigned treatment (0/1) and a treatment*age interaction is specified.

The following code provides an estimate of the treatment HR at different ages, when age is modeled with restricted cubic splines. The model is further adjusted for tumor site, race, and previous use of IV drug. It is recommended to check the distribution of \(Z\) (here, age) to define a realistic range of var2values. Finally, the code is replicated by using the delta method or bootstrap for obtaining confidence intervals. Note that when ci.method = "bootstrap" is specified, additional options can be specified.

myformula <- Surv(time, censor) ~ treat*rcs(age, 3) + site + nonwhite + ivdrug
model_rcs <- cph(myformula , data = umaru , x = TRUE , y=TRUE)

HR_rcs_delta <- intEST( var2values = c(20:50)
                     , model = model_rcs , data = umaru , var1 ="treat", var2="age" ,ci.method = "delta")

plotINT(HR_rcs_delta , xlab = "Age")

# note that the user could directly specify the function of interest (here rcsHR) and obtain the same results
HR_rcs_delta <- rcsHR( var2values = c(20:50)
                     , model = model_rcs , data = umaru , var1 ="treat", var2="age" ,ci.method = "delta")


HR_rcs_boot <- intEST( var2values = c(20:50)
                      , model = model_rcs , data = umaru , var1 ="treat", var2="age" , ci.method = "bootstrap"
                      , ci.boot.method = "norm" , R = 500 , parallel = "multicore")

plotINT(HR_rcs_boot , xlab = "Age")

The following code replicates the same analysis in the setting where the interaction model was specified without modeling the continuous covariate \(Z\) with restricted cubic splines (log-linear interaction model). The loglinHR function used in this context requires the same options of rcsHR

myformula <- Surv(time, censor) ~ treat*age + site + nonwhite + ivdrug
model_loglin <- cph(myformula , data = umaru , x = TRUE , y=TRUE)

HR_loglin_delta <- intEST( var2values = c(20:50)
                     , model = model_loglin , data = umaru , var1 ="treat", var2="age", ci.method = "delta")

plotINT(HR_loglin_delta , xlab = "Age")

HR_loglin_boot <- intEST( var2values = c(20:50)
                      , model = model_loglin , data = umaru , var1 ="treat", var2="age", ci.method = "bootstrap"
                      , ci.boot.method = "norm" , R = 500 , parallel = "multicore")

plotINT(HR_loglin_boot , xlab = "Age")

In this example the log-linear and spline interaction models provide similar results, with the treatment HR decreasing over levels of age (note that, in regular practice, a thorough model comparison to decide whether the log-linear or spline interaction model better fits the data should be conducted before using interactionRCS).

Example 2

This second example is based on a larger dataset of 17549 individuals from a population study of non-alcoholic fatty liver disease (NAFLD). Here the main exposure \(X\) of interest is age, and an interaction with BMI is specified. The code will be the same in the presence of a continuous covariate, but the interpretation will be slightly different as the functions will estimate the HR for a unit increase in \(X\) (age) over levels of BMI. The following code will provide estimates of the HR for a unit increase of age over levels of BMI, modeled with restricted cubic spline.

myformula <- Surv(futime, status) ~ age*rcs(bmi, 3) + male
model2_rcs <- cph(myformula , data = nafld1 , x = TRUE , y=TRUE)


HR2_rcs_delta <- intEST( var2values = c(15:50)
                     , model = model2_rcs , data = nafld1 , var1 ="age", var2="bmi" , ci=TRUE , conf = 0.95 , ci.method = "delta")

plotINT(HR2_rcs_delta , xlab = "BMI")

If the user is interested in a different HR (e.g. the HR for a 10-unit increase, or the HR for a 1-sd unit increase), the predictor should be standardized accordingly before fitting the model. For example, here we estimate the HR for a 5-years increase in age

nafld1$age5<-nafld1$age/5

myformula <- Surv(futime, status) ~ age5*rcs(bmi, 3) + male
model3_rcs <- cph(myformula , data = nafld1 , x = TRUE , y=TRUE)


HR3_rcs_delta <- intEST( var2values = c(15:50)
                     , model = model3_rcs , data = nafld1 , var1 ="age5", var2="bmi" , ci=TRUE , conf = 0.95 , ci.method = "delta")

plotINT(HR3_rcs_delta , xlab = "BMI")

The effect of age changes over levels of BMI in a quadratic form, with a lower effect for both those with increasingly higher and lower BMI. Including BMI without any spline transformation (i.e. the log-linear interaction model) would fail to capture this functional form, as shown from the following code. A comparison of the models through the use of AIC would also confirm that the spline interaction model better fits the data

myformula <- Surv(futime, status) ~ age*bmi + male
model2_loglin <- cph(myformula , data = nafld1 , x = TRUE , y=TRUE)

HR2_loglin_delta <- intEST( var2values = c(15:50)
                     , model = model2_loglin , data = nafld1 , var1 ="age", var2="bmi" , ci=TRUE , conf = 0.95 , ci.method = "delta")

plotINT(HR2_loglin_delta , xlab = "BMI")

AIC(model2_loglin)
## [1] 15979.93
AIC(model2_rcs)
## [1] 15924.4

Appendix - theoretical background

interactionRCS requires results from a regression model where an interaction between a main predictor (binary or continuous) \(X\) and a continuous predictor \(Z\) has been specified. This interaction can be included as a simple product term between the 2 predictors, or by flexibly modeling \(Z\) with restricted cubic splines. For both interaction settings, the main exposure of interest \(X\) has to either be binary or continuous.

Log-linear interaction model

Cox PH regression

A basic Cox model with 2 predictors and their interaction takes the form:

\(h(t|x,z)=h_0\cdot\exp(\beta_1x+\beta_2z+\beta_3x\cdot z)\)

After estimation of the model, we want to predict the HR for \(X\) over levels of \(Z\). This is given by

\(HR_{10}=\frac{h(t|x=1,z)}{h(t|x=0,z)}=\frac{h_0\cdot\exp(\beta_1x+\beta_2z+\beta_3x\cdot z)}{h_0\cdot\exp(\beta_1x+\beta_2z+\beta_3x\cdot z)}=\frac{\exp(\beta_1+\beta_2z+\beta_3z)}{\exp(\beta_2z)}\) ,

which will be plotted against \(Z\)

To estimate \(95\%\) confidence intervals \(SE(HR_{10})\) is required. This is obtained by focusing on \(\log(HR_{10})\) and calculating lower and upper bounds for this quantity, which are then exponentiated.

\(SE(\log(HR_{10}))=SE(\log(\frac{\exp(\beta_1+\beta_2z+\beta_3z)}{\exp(\beta_2z)}))=SE(\log(\exp(\beta_1+\beta_2z+\beta_3z)-\log(\exp(\beta_2z)))=SE(\beta_1+\beta_3z)\)

This is calculated by using the delta method. Upper and lower bounds are then derived with \(\exp(\log(HR_{10})\pm1.96\cdot SE(log(HR_{10})))\)

Logistic regression

A logistic regression model with 2 predictors and their interaction takes the form:

\(logit(p|x,z)=\beta_0+\beta_1x+\beta_2z+\beta_3x\cdot z\)

After estimation of the model, we want to predict the OR for \(X\) over levels of \(Z\). This is given by

\(OR_{10}=\frac{odds(p|x=1,z)}{odds(p|x=0,z)}=\frac{\exp(\beta_0+\beta_1x+\beta_2z+\beta_3x\cdot z)}{\exp(\beta_0+\beta_1x+\beta_2z+\beta_3x\cdot z)}=\frac{\exp(\beta_0+\beta_1+\beta_2z+\beta_3z)}{\exp(\beta_0+\beta_2z)}\) ,

which will be plotted against \(Z\)

To estimate \(95\%\) confidence intervals \(SE(OR_{10})\) is required. This is obtained by focusing on \(\log(OR_{10})\) and calculating lower and upper bounds for this quantity, which are then exponentiated.

\(SE(\log(HR_{10}))=SE(\log(\frac{\exp(\beta_0+\beta_1+\beta_2z+\beta_3z)}{\exp(\beta_0+\beta_2z)}))=SE(\log(\exp(\beta_0+\beta_1+\beta_2z+\beta_3z)-\log(\exp(\beta_0+\beta_2z)))=SE(\beta_1+\beta_3z)\)

This is calculated by using the delta method. Upper and lower bounds are then derived with \(\exp(\log(OR_{10})\pm1.96\cdot SE(log(OR_{10})))\)

Linear regression

A linear regression model with 2 predictors and their interaction takes the form:

\(E[Y|x,z]=\beta_0+\beta_1x+\beta_2z+\beta_3x\cdot z\)

After estimation of the model, we want to predict the effect for \(X\) over levels of \(Z\). This is given by

\(E[Y|x=1,z]-E[Y=0|x,z]=(\beta_0+\beta_1+\beta_2z+\beta_3\cdot z) -(\beta_0+\beta_2z)=\beta_1+\beta_3\cdot z\) ,

which will be plotted against \(Z\)

To estimate \(95\%\) confidence intervals we need the basic linear combination of standard errors calculated by \(SE(\beta_1+\beta_3z)=\sqrt{SE(\beta_1)^2+SE(\beta_3z)^2+2cov(\beta_1,\beta_3)z}\),

or by using the Delta method

Restricted cubic splines interaction model

interactionRCS allows the continuous covariate \(Z\) to be flexibly modeled with restricted cubic splines, with any given of number (\(p\)) of knots (\(t_1, t_2, t_3, \dots, t_p\)).

Cox PH regression

The interaction model, in this setting, takes the form:

\(h(t|x,z,c)=h_0\cdot\exp(\beta_1x+sp(z)+sp_2(z\cdot x))\)

If 3 knots are specified:

\(sp(z)=\alpha_1z+\alpha_2\cdot[\frac{(z-t_1)^3_+-(z-t_2)^3_+\cdot\frac{ t_3-t_1}{ t_3-t_2} +(z-t_3)^3_+\cdot\frac{t_2-t_1}{t_3-t_2}}{(t_3-t_1)^2}]\)

and

\(sp_2(z\cdot x)=\gamma_1x+\gamma_2x\cdot[\frac{(z-t_1)^3_+-(z-t_2)^3_+\cdot\frac{ t_3-t_1}{ t_3-t_2} +(z-t_3)^3_+\cdot\frac{t_2-t_1}{t_3-t_2}}{(t_3-t_1)^2}]\)

After estimation of the model, we want to predict the HR for \(X\) over levels of \(Z\). This is given by

\(HR_{10}=\frac{h(t|X=x_1,Z)}{h(t|X=x_2,Z)}=\frac{h_0\cdot\exp(\beta_1x_1+sp(z)+sp_2(z)\cdot x_1)}{h_0\cdot\exp(\beta_1x_2+sp(z)+sp_2(z)\cdot x_2)}=\frac{h(t|X=x_1,Z)}{h(t|X=x_2,Z)}=\frac{\exp(\beta_1+sp(z)+sp_2(z))}{\exp(sp(z))}\) ,

which will be plotted against \(Z\)

Similarly to the previous situation, the \(SE\) can be derived by using the delta method to calculate \(SE(\beta_1+sp_2(z))\)

Logistic regression

The interaction model, in this setting, takes the form:

\(logit(p|x,z)=\beta_0+\beta_1x+sp(z)+sp_2(z\cdot x)\)

After estimation of the model, we want to predict the HR for \(X\) over levels of \(Z\). This is given by

\(OR_{10}=\frac{\exp(\beta_0+\beta_1+sp(z)+sp_2(z))}{\exp(\beta_0+sp(z))}=\exp(\beta_1+sp_2(z))\) ,

which will be plotted against \(Z\)

Similarly to the previous situation, the \(SE\) can be derived by using the delta method to calculate \(SE(\beta_1+sp_2(z))\)

Linear regression

The interaction model, in this setting, takes the form:

\(E[Y|x,z]=\beta_0+\beta_1x+sp(z)+sp_2(z\cdot x)\)

After estimation of the model, we want to predict

\(E[Y|x=1,z]-E[Y=0|x,z]=(\beta_0+\beta_1+sp(z)+sp_2(z)) -(\beta_0+sp(z))=\beta_1+sp_2(z)\) ,

which will be plotted against \(Z\)

The \(SE\) can be derived by using the delta method to calculate \(SE(\beta_1+sp_2(z))\)

More than 3 knots

Based on Harrell’s book (chapter 2-23), we derive equations for more than 3 knots

The general splines function for \(k\) knots is:

\(f(X)=\beta_0+\beta_1\cdot X_1+\beta_2\cdot X_2+\dots+\beta_{k-1}\cdot X_{k-1}\)

where \(X_1=X\) and for \(j=1,\dots,k-2\),

\(X_{j+1}=\frac{(z-t_j)^3_+-(z-t_{k-1})^3_+\cdot\frac{ t_k-t_j}{t_k-t_{k-1}} +(z-t_k)^3_+\cdot\frac{t_{k-1}-t_j}{t_k-t_{k-1}}}{(t_k-t_1)^2}\)

The Cox model with \(k=4\) will therefore be:

\(h(t|x,z,c)=h_0\cdot\exp(\beta_1x+sp(z)+sp_2(z\cdot x))\)

where

\(sp(z)=\alpha_1z+\alpha_2\cdot[\frac{(z-t_1)^3_+-(z-t_3)^3_+\cdot\frac{ t_4-t_1}{ t_4-t_3} +(z-t_3)^3_+\cdot\frac{t_3-t_1}{t_4-t_3}}{(t_4-t_1)^2}]+\alpha_3\cdot[\frac{(z-t_2)^3_+-(z-t_3)^3_+\cdot\frac{ t_4-t_2}{ t_4-t_3} +(z-t_4)^3_+\cdot\frac{t_3-t_2}{t_4-t_3}}{(t_4-t_1)^2}]\)

and

\(sp_2(z\cdot x)=\gamma_1x+\gamma_2x\cdot[\frac{(z-t_1)^3_+-(z-t_3)^3_+\cdot\frac{ t_4-t_1}{ t_4-t_3} +(z-t_4)^3_+\cdot\frac{t_3-t_2}{t_4-t_3}}{(t_3-t_1)^2}]+\gamma_3x\cdot[\frac{(z-t_2)^3_+-(z-t_3)^3_+\cdot\frac{ t_4-t_2}{ t_4-t_3} +(z-t_4)^3_+\cdot\frac{t_3-t_2}{t_4-t_3}}{(t_4-t_1)^2}]\)

The generic Cox model will be:

\(h(t|x,z,c)=h_0\cdot\exp(\beta_1x+sp(z)+sp_2(z\cdot x))\)

where

\(sp(z)=\alpha_1z+\alpha_2\cdot[\frac{(z-t_j)^3_+-(z-t_{k-1})^3_+\cdot\frac{ t_k-t_j}{t_k-t_{k-1}} +(z-t_k)^3_+\cdot\frac{t_{k-1}-t_j}{t_k-t_{k-1}}}{(t_k-t_1)^2}]+\alpha_3\cdot[\frac{(z-t_j)^3_+-(z-t_{k-1})^3_+\cdot\frac{ t_k-t_j}{t_k-t_{k-1}} +(z-t_k)^3_+\cdot\frac{t_{k-1}-t_j}{t_k-t_{k-1}}}{(t_k-t_1)^2}]\)

and

\(sp_2(z\cdot x)=\gamma_1x+\gamma_2x\cdot[\frac{(z-t_j)^3_+-(z-t_{k-1})^3_+\cdot\frac{ t_k-t_j}{t_k-t_{k-1}} +(z-t_k)^3_+\cdot\frac{t_{k-1}-t_j}{t_k-t_{k-1}}}{(t_k-t_1)^2}]+\gamma_3x\cdot[\frac{(z-t_j)^3_+-(z-t_{k-1})^3_+\cdot\frac{ t_k-t_j}{t_k-t_{k-1}} +(z-t_k)^3_+\cdot\frac{t_{k-1}-t_j}{t_k-t_{k-1}}}{(t_k-t_1)^2}]\)