Reproducing Goeyvaerts et al. (2018) using ergm.multi

Pavel N. Krivitsky

ergm.multi version 0.2.1.1 (2024-11-05)

library(ergm.multi)
library(dplyr)
library(purrr)
library(tibble)
library(ggplot2)

Obtaining data

The list of networks studied by Goeyvaerts et al. (2018) is included in this package:

data(Goeyvaerts)
length(Goeyvaerts)
## [1] 318

An explanation of the networks, including a list of their network (%n%) and vertex (%v%) attributes, can be obtained via ?Goeyvaerts. A total of 318 complete networks were collected, then two were excluded due to “nonstandard” family composition:

Goeyvaerts %>% discard(`%n%`, "included") %>% map(as_tibble, unit="vertices")
## [[1]]
## # A tibble: 4 × 5
##   vertex.names   age gender na    role       
##          <int> <int> <chr>  <lgl> <chr>      
## 1            1    32 F      FALSE Mother     
## 2            2    48 F      FALSE Grandmother
## 3            3    32 M      FALSE Father     
## 4            4    10 F      FALSE Child      
## 
## [[2]]
## # A tibble: 3 × 5
##   vertex.names   age gender na    role  
##          <int> <int> <chr>  <lgl> <chr> 
## 1            1    29 F      FALSE Mother
## 2            2    28 F      FALSE Mother
## 3            3     0 F      FALSE Child

To reproduce the analysis, exclude them as well:

G <- Goeyvaerts %>% keep(`%n%`, "included")

Data summaries

Obtain weekday indicator, network size, and density for each network, and summarize them as in Goeyvaerts et al. (2018) Table 1:

G %>% map(~list(weekday = . %n% "weekday",
                n = network.size(.),
                d = network.density(.))) %>% bind_rows() %>%
  group_by(weekday, n = cut(n, c(1,2,3,4,5,9))) %>%
  summarize(nnets = n(), p1 = mean(d==1), m = mean(d)) %>% kable()
weekday n nnets p1 m
FALSE (1,2] 3 1.0000000 1.0000000
FALSE (2,3] 19 0.7368421 0.8771930
FALSE (3,4] 48 0.8541667 0.9618056
FALSE (4,5] 18 0.7777778 0.9500000
FALSE (5,9] 3 1.0000000 1.0000000
TRUE (1,2] 9 1.0000000 1.0000000
TRUE (2,3] 53 0.9056604 0.9622642
TRUE (3,4] 111 0.7747748 0.9279279
TRUE (4,5] 39 0.6410256 0.8974359
TRUE (5,9] 13 0.4615385 0.8454212

Reproducing ERGM fits

We now reproduce the ERGM fits. First, we extract the weekday networks:

G.wd <- G %>% keep(`%n%`, "weekday")
length(G.wd)
## [1] 225

Next, we specify the multi-network model using the N(formula, lm) operator. This operator will evaluate the ergm formula formula on each network, weighted by the predictors passed in the one-sided lm formula, which is interpreted the same way as that passed to the built-in lm() function, with its “data” being the table of network attributes.

Since different networks may have different compositions, to have a consistent model, we specify a consistent list of family roles.

roleset <- sort(unique(unlist(lapply(G.wd, `%v%`, "role"))))

We now construct the formula object, which will be passed directly to ergm():

# Networks() function tells ergm() to model these networks jointly.
f.wd <- Networks(G.wd) ~
  # This N() operator adds three edge counts:
  N(~edges,
    ~ # one total for all networks  (intercept implicit as in lm),
      I(n<=3)+ # one total for only small households, and
      I(n>=5) # one total for only large households.
    ) +

  # This N() construct evaluates each of its terms on each network,
  # then sums each statistic over the networks:
  N(
      # First, mixing statistics among household roles, including only
      # father-mother, father-child, and mother-child counts.
      # Since tail < head in an undirected network, in the
      # levels2 specification, it is important that tail levels (rows)
      # come before head levels (columns). In this case, since
      # "Child" < "Father" < "Mother" in alphabetical order, the
      # row= and col= categories must be sorted accordingly.
    ~mm("role", levels = I(roleset),
        levels2=~.%in%list(list(row="Father",col="Mother"),
                           list(row="Child",col="Father"),
                           list(row="Child",col="Mother"))) +
      # Second, the nodal covariate effect of age, but only for
      # edges between children.
      F(~nodecov("age"), ~nodematch("role", levels=I("Child"))) +
      # Third, 2-stars.
      kstar(2)
  ) +
  
  # This N() adds one triangle count, totalled over all households
  # with at least 6 members.
  N(~triangles, ~I(n>=6))

See ergmTerm?mm for documentation on the mm term used above. Now, we can fit the model:

# (Set seed for predictable run time.)
fit.wd <- ergm(f.wd, control=snctrl(seed=123))
summary(fit.wd)
## Call:
## ergm(formula = f.wd, control = snctrl(seed = 123))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                                                         Estimate Std. Error MCMC % z value Pr(>|z|)    
## N(1)~edges                                               0.84220    0.52964      0   1.590 0.111803    
## N(I(n <= 3)TRUE)~edges                                   1.49133    0.44005      0   3.389 0.000701 ***
## N(I(n >= 5)TRUE)~edges                                  -0.78558    0.20964      0  -3.747 0.000179 ***
## N(1)~mm[role=Child,role=Father]                         -0.62197    0.48459      0  -1.284 0.199316    
## N(1)~mm[role=Child,role=Mother]                          0.14934    0.52593      0   0.284 0.776441    
## N(1)~mm[role=Father,role=Mother]                         0.26480    0.60298      0   0.439 0.660552    
## N(1)~F(nodematch("role",levels=I("Child")))~nodecov.age -0.07144    0.01675      0  -4.265  < 1e-04 ***
## N(1)~kstar2                                             -0.26908    0.21292      0  -1.264 0.206320    
## N(1)~triangle                                            2.07121    0.31160      0   6.647  < 1e-04 ***
## N(I(n >= 6)TRUE)~triangle                               -0.28152    0.11012      0  -2.557 0.010572 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1975.5  on 1425  degrees of freedom
##  Residual Deviance:  611.8  on 1415  degrees of freedom
##  
## AIC: 631.8  BIC: 684.4  (Smaller is better. MC Std. Err. = 0.9136)

Similarly, we can extract the weekend network, and fit it to a smaller model. We only need one N() operator, since all statistics are applied to the same set of networks, namely, all of them.

G.we <- G %>% discard(`%n%`, "weekday")
fit.we <- ergm(Networks(G.we) ~
                 N(~edges +
                     mm("role", levels=I(roleset),
                        levels2=~.%in%list(list(row="Father",col="Mother"),
                                           list(row="Child",col="Father"),
                                           list(row="Child",col="Mother"))) +
                     F(~nodecov("age"), ~nodematch("role", levels=I("Child"))) +
                     kstar(2) +
                     triangles), control=snctrl(seed=123))
summary(fit.we)
## Call:
## ergm(formula = Networks(G.we) ~ N(~edges + mm("role", levels = I(roleset), 
##     levels2 = ~. %in% list(list(row = "Father", col = "Mother"), 
##         list(row = "Child", col = "Father"), list(row = "Child", 
##             col = "Mother"))) + F(~nodecov("age"), ~nodematch("role", 
##     levels = I("Child"))) + kstar(2) + triangles), control = snctrl(seed = 123))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                                                         Estimate Std. Error MCMC % z value Pr(>|z|)    
## N(1)~edges                                               1.98705    1.51689      0   1.310  0.19021    
## N(1)~mm[role=Child,role=Father]                         -0.92832    1.50464      0  -0.617  0.53726    
## N(1)~mm[role=Child,role=Mother]                          0.38258    1.69718      0   0.225  0.82165    
## N(1)~mm[role=Father,role=Mother]                        -0.66702    1.61341      0  -0.413  0.67930    
## N(1)~F(nodematch("role",levels=I("Child")))~nodecov.age -0.16899    0.05499      0  -3.073  0.00212 ** 
## N(1)~kstar2                                             -0.90774    0.35253      0  -2.575  0.01003 *  
## N(1)~triangle                                            3.66289    0.70949      0   5.163  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 802.7  on 579  degrees of freedom
##  Residual Deviance: 130.6  on 572  degrees of freedom
##  
## AIC: 144.6  BIC: 175.1  (Smaller is better. MC Std. Err. = 1.016)

Diagnostics

Perform diagnostic simulation (Krivitsky, Coletti, and Hens 2023), summarize the residuals, and make residuals vs. fitted and scale-location plots:

gof.wd <- gofN(fit.wd, GOF = ~ edges + kstar(2) + triangles)
summary(gof.wd)
## $`Observed/Imputed values`
##      edges           kstar2        triangle    
##  Min.   : 1.00   Min.   : 0.0   Min.   : 0.00  
##  1st Qu.: 3.00   1st Qu.: 3.0   1st Qu.: 1.00  
##  Median : 6.00   Median :12.0   Median : 4.00  
##  Mean   : 5.79   Mean   :13.6   Mean   : 4.34  
##  3rd Qu.: 6.00   3rd Qu.:12.0   3rd Qu.: 4.00  
##  Max.   :18.00   Max.   :78.0   Max.   :23.00  
##  NA's   :1       NA's   :10     NA's   :10     
## 
## $`Fitted values`
##      edges            kstar2         triangle     
##  Min.   : 0.780   Min.   : 2.53   Min.   : 0.710  
##  1st Qu.: 2.958   1st Qu.: 8.45   1st Qu.: 2.500  
##  Median : 5.595   Median :10.63   Median : 3.370  
##  Mean   : 5.805   Mean   :13.70   Mean   : 4.377  
##  3rd Qu.: 5.862   3rd Qu.:11.68   3rd Qu.: 3.840  
##  Max.   :18.040   Max.   :81.75   Max.   :25.930  
##  NA's   :1        NA's   :10      NA's   :10      
## 
## $`Pearson residuals`
##      edges               kstar2            triangle       
##  Min.   :-12.33649   Min.   :-9.72194   Min.   :-7.33588  
##  1st Qu.:  0.18088   1st Qu.: 0.18157   1st Qu.: 0.15856  
##  Median :  0.36742   Median : 0.38421   Median : 0.38754  
##  Mean   : -0.05665   Mean   :-0.05005   Mean   :-0.03546  
##  3rd Qu.:  0.49195   3rd Qu.: 0.52812   3rd Qu.: 0.55634  
##  Max.   :  2.01982   Max.   : 2.39751   Max.   : 2.60399  
##  NA's   :1           NA's   :10         NA's   :10        
## 
## $`Variance of Pearson residuals`
## $`Variance of Pearson residuals`$edges
## [1] 1.811466
## 
## $`Variance of Pearson residuals`$kstar2
## [1] 1.557271
## 
## $`Variance of Pearson residuals`$triangle
## [1] 1.369088
## 
## 
## $`Std. dev. of Pearson residuals`
## $`Std. dev. of Pearson residuals`$edges
## [1] 1.345907
## 
## $`Std. dev. of Pearson residuals`$kstar2
## [1] 1.247907
## 
## $`Std. dev. of Pearson residuals`$triangle
## [1] 1.17008

Variances of Pearson residuals substantially greater than 1 suggest unaccounted-for heterogeneity.

autoplot(gof.wd)
## [[1]]
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[2]]
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[3]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[4]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[5]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[6]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

The plots don’t look unreasonable.

Also make plots of residuals vs. square root of fitted and vs. network size:

autoplot(gof.wd, against=sqrt(.fitted))
## [[1]]
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[2]]
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[3]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[4]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[5]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[6]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

autoplot(gof.wd, against=ordered(n))
## [[1]]
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 0.975
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 2.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 6.2178e-17
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[2]]
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 0.975
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 2.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 6.2178e-17
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[3]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[4]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[5]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[6]]
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Removed 10 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 10 rows containing missing values or values outside the scale range (`geom_text_repel()`).

It looks like network-size effects are probably accounted for.

References

Goeyvaerts, Nele, Eva Santermans, Gail Potter, Andrea Torneri, Kim Van Kerckhove, Lander Willem, Marc Aerts, Philippe Beutels, and Niel Hens. 2018. “Household Members Do Not Contact Each Other at Random: Implications for Infectious Disease Modelling.” Proceedings of the Royal Society B: Biological Sciences 285 (1893): 20182201. https://doi.org/10.1098/rspb.2018.2201.
Krivitsky, Pavel N., Pietro Coletti, and Niel Hens. 2023. “A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks.” Journal of the American Statistical Association 118 (544): 2213–24. https://doi.org/10.1080/01621459.2023.2242627.