Models and model comparisons

Implemented model types

The risks package implements all major regression models that have been proposed for relative risks and risk differences. By default (approach = "auto"), riskratio() and riskdiff() estimate the most efficient valid model that converges; in more numerically challenging cases, they default to marginal standardization, which does not return parameters for covariates.

The following models are implemented in the risks package:

#1 approach = RR RD Model Reference
1 glm riskratio riskdiff Binomial model with a log or identity link Wacholder S. Binomial regression in GLIM: Estimating risk ratios and risk differences. Am J Epidemiol 1986;123:174-184.
2 glm_startp riskratio riskdiff Binomial model with a log or identity link, convergence-assisted by starting values from Poisson model Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 2005;162:199-200.
3 margstd_delta riskratio riskdiff Marginally standardized estimates using binomial models with a logit link (logistic model) with standard errors calculated via the delta method. This package.
4 margstd_boot riskratio riskdiff Marginally standardized estimates using binomial models with a logit link (logistic model) with bias-corrected accelerated (BCa) confidence intervals from parametric bootstrapping (see Marginal standardization). This package.  For marginal standardization with nonparametric bootstrapping, see: Localio AR, Margolis DJ, Berlin JA. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. J Clin Epidemiol 2007;60(9):874-82.
glm_cem riskratio Binomial model with log-link fitted via combinatorial expectation maximization instead of Fisher scoring Donoghoe MW, Marschner IC. logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model. J Stat Softw 2018;86(9).
glm_cem riskdiff Additive binomial model (identity link) fitted via combinatorial expectation maximization instead of Fisher scoring Donoghoe MW, Marschner IC. Stable computational methods for additive binomial models with application to adjusted risk differences. Comput Stat Data Anal 2014;80:184-96.
robpoisson riskratio riskdiff Log-linear (Poisson) model with robust/sandwich/empirical standard errors Zou G. A modified Poisson regression approach to prospective studies with binary data. Am J Epidemiol 2004;159(7):702-6
duplicate riskratio Case-duplication approach, fitting a logistic model with cluster-robust standard errors Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vandenbroucke JP. Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality. Stat Med 1993;12:1733–45.
glm_startd riskratio Binomial model with a log link, convergence-assisted by starting values from case-duplication logistic model This package.
logistic riskratio, for comparison only Binomial model with logit link (i.e., the logistic model), returning odds ratios Included for comparison purposes only.

1 Indicates the priority with which the legacy modelling strategy (approach = "legacy") attempts model fitting (glm_startp: only for RR).

Which model was fitted is always indicated in the first line of the output of summary() and in the model column of tidy(). In methods sections of manuscripts, the approach can be described in detail as follows: “Risk ratios (or risk differences) were obtained via (method listed in the first line of model summary.risks(...)) using the risks R package (reference to this package and/or the article listed in the column ‘reference’).”

For example: “Risk ratios were obtained from binomial models with a log link, convergence-assisted by Poisson models (ref. Spiegelman and Hertzmark, AJE 2005), using the risks R package (https://stopsack.github.io/risks/).”

Model choice

By default, automatic model fitting (approach = "auto") reports results from marginal standardization using a logistic model with delta method standard errors (equivalent to approach = "margstd_delta"). An exception is made if interaction terms between exposure and confounders are included. This case, confidence intervals are calculated using bootstrapping (equivalent to requesting approach = "margstd_boot"). Alternatively, any of the options listed under approach = in the table can be requested directly. However, unlike with approach = "auto" (the default) or approach = "legacy", the selected model may not converge.

We load the same example data as in the Get Started vignette.

library(risks)  # provides riskratio(), riskdiff(), postestimation functions
library(dplyr)  # For data handling
library(broom)  # For tidy() model summaries
data(breastcancer)

We then select a binomial model with starting values from the Poisson model:

riskratio(formula = death ~ stage + receptor, 
          data = breastcancer, 
          approach = "glm_startp")
#> 
#> Risk ratio model
#> Call:  stats::glm(formula = death ~ stage + receptor, family = stats::binomial(link = "log"), 
#>     data = breastcancer, start = "(from Poisson model)")
#> 
#> Coefficients:
#>    (Intercept)   stageStage II  stageStage III     receptorLow  
#>        -2.3521          0.9314          1.7695          0.4436  
#> 
#> Degrees of Freedom: 191 Total (i.e. Null);  188 Residual
#> Null Deviance:       228.1 
#> Residual Deviance: 185.9     AIC: 193.9


However, the binomial model without starting values does not converge:

riskratio(formula = death ~ stage + receptor, 
          data = breastcancer, 
          approach = "glm")
#> Error: no valid set of coefficients has been found: please supply starting values

Model comparisons

With approach = "all", all model types listed in the tables are fitted. The fitted object, e.g., fit, is one of the converged models. A summary of the convergence status of all models is displayed at the beginning of summary(fit):

fit_all <- riskratio(formula = death ~ stage + receptor, 
                     data = breastcancer, 
                     approach = "all")
#> Loading required package: doParallel
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
#> Loading required package: numDeriv
#> Loading required package: quantreg
#> Loading required package: SparseM
#> 
#> Attaching package: 'SparseM'
#> The following object is masked from 'package:base':
#> 
#>     backsolve
#> 
#> Attaching package: 'turboEM'
#> The following objects are masked from 'package:numDeriv':
#> 
#>     grad, hessian
summary(fit_all)
#> 
#> All fitted models:
#>            Model Converged Max.prob.
#> 1     robpoisson      TRUE 0.9052473
#> 2            glm     FALSE        NA
#> 3     glm_startp      TRUE 0.8702313
#> 4         logbin      TRUE 0.8702292
#> 5   logbin_start      TRUE 0.8702294
#> 6   margstd_boot      TRUE 0.8158560
#> 7  margstd_delta      TRUE 0.8158560
#> 8       logistic      TRUE 0.8158560
#> 9      duplicate      TRUE 0.4785245
#> 10    glm_startd      TRUE 0.8702318
#> 
#> Risk ratio model, fitted as Poisson model with robust covariance (robpoisson).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = poisson(link = "log"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -1.34555  -0.68794  -0.43330   0.09792   1.70862  
#> 
#> Coefficients:
#>                Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)     -2.3658     0.3602  -6.569 5.08e-11 ***
#> stageStage II    0.9246     0.3932   2.351   0.0187 *  
#> stageStage III   1.7772     0.3855   4.610 4.03e-06 ***
#> receptorLow      0.4891     0.2129   2.297   0.0216 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 137.00  on 191  degrees of freedom
#> Residual deviance: 110.29  on 188  degrees of freedom
#> AIC: 226.29
#> 
#> Number of Fisher Scoring iterations: 6
#> 
#> Confidence intervals for coefficients: (robust)
#>                      2.5 %     97.5 %
#> (Intercept)    -3.07172916 -1.6599084
#> stageStage II   0.15386831  1.6952382
#> stageStage III  1.02161187  2.5328206
#> receptorLow     0.07183369  0.9062773


Individual models can be accessed as fit$all_models[[1]] through fit$all_models[[6]] (or [[7]] if fitting a risk ratio model). tidy() shows coefficients and confidence intervals from all models that converged:

tidy(fit_all) %>%
  select(-statistic, -p.value) %>%
  print(n = 50)
#> # A tibble: 34 × 6
#>    term            estimate std.error conf.low conf.high model        
#>    <chr>          <dbl[1d]>     <dbl>    <dbl>     <dbl> <chr>        
#>  1 (Intercept)       -2.37      0.360  -3.07      -1.66  robpoisson   
#>  2 stageStage II      0.925     0.393   0.154      1.70  robpoisson   
#>  3 stageStage III     1.78      0.386   1.02       2.53  robpoisson   
#>  4 receptorLow        0.489     0.213   0.0718     0.906 robpoisson   
#>  5 (Intercept)       -2.35      0.360  -3.06      -1.65  glm_startp   
#>  6 stageStage II      0.931     0.393   0.161      1.70  glm_startp   
#>  7 stageStage III     1.77      0.385   1.01       2.52  glm_startp   
#>  8 receptorLow        0.444     0.197   0.0578     0.829 glm_startp   
#>  9 (Intercept)       -2.35      0.360  -3.06      -1.65  logbin       
#> 10 stageStage II      0.931     0.393   0.161      1.70  logbin       
#> 11 stageStage III     1.77      0.385   1.01       2.52  logbin       
#> 12 receptorLow        0.444     0.197   0.0578     0.829 logbin       
#> 13 (Intercept)       -2.35      0.360  -3.06      -1.65  logbin_start 
#> 14 stageStage II      0.931     0.393   0.161      1.70  logbin_start 
#> 15 stageStage III     1.77      0.385   1.01       2.52  logbin_start 
#> 16 receptorLow        0.444     0.197   0.0578     0.829 logbin_start 
#> 17 stageStage I       0         0      NA         NA     margstd_boot 
#> 18 stageStage II      0.899     0.887   0.264      1.94  margstd_boot 
#> 19 stageStage III     1.81      0.885   1.11       2.62  margstd_boot 
#> 20 stageStage I       0         0       0          0     margstd_delta
#> 21 stageStage II      0.899     0.387   0.140      1.66  margstd_delta
#> 22 stageStage III     1.81      0.378   1.07       2.55  margstd_delta
#> 23 (Intercept)       -2.37      0.420  -3.19      -1.54  logistic     
#> 24 stageStage II      1.13      0.466   0.222      2.05  logistic     
#> 25 stageStage III     2.94      0.586   1.79       4.08  logistic     
#> 26 receptorLow        0.920     0.395   0.145      1.69  logistic     
#> 27 (Intercept)       -2.37      0.362  -3.08      -1.66  duplicate    
#> 28 stageStage II      0.923     0.395   0.149      1.70  duplicate    
#> 29 stageStage III     1.78      0.385   1.03       2.54  duplicate    
#> 30 receptorLow        0.504     0.224   0.0647     0.944 duplicate    
#> 31 (Intercept)       -2.35      0.360  -3.06      -1.65  glm_startd   
#> 32 stageStage II      0.931     0.393   0.161      1.70  glm_startd   
#> 33 stageStage III     1.77      0.385   1.01       2.52  glm_startd   
#> 34 receptorLow        0.444     0.197   0.0578     0.829 glm_startd