mvord: Additional examples

Rainer Hirk, Kurt Hornik and Laura Vana

2024-08-30

Load package mvord

library(mvord)

Load data set

data("data_cr", package = "mvord")
str(data_cr, vec.len = 3)
## 'data.frame':    690 obs. of  10 variables:
##  $ rater1 : Ord.factor w/ 5 levels "A"<"B"<"C"<"D"<..: 2 3 3 2 5 4 3 4 ...
##  $ rater2 : Ord.factor w/ 5 levels "A"<"B"<"C"<"D"<..: 2 4 4 2 5 NA 3 4 ...
##  $ rater3 : Ord.factor w/ 6 levels "F"<"G"<"H"<"I"<..: 3 NA NA NA 6 NA 2 5 ...
##  $ rater4 : Ord.factor w/ 2 levels "L"<"M": 1 2 2 1 2 2 2 2 ...
##  $ firm_id: int  1 2 3 4 5 6 7 8 ...
##  $ LR     : num  1.72 1.84 2.64 1.31 ...
##  $ LEV    : num  2.114 0.883 2.3 2.638 ...
##  $ PR     : num  0.3779 -0.1503 -0.0521 0.3289 ...
##  $ RSIZE  : num  -6.37 -7.84 -7.98 -5.86 ...
##  $ BETA   : num  0.836 0.49 0.802 1.137 ...
head(data_cr, n = 3)
##   rater1 rater2 rater3 rater4 firm_id       LR       LEV          PR     RSIZE
## 1      B      B      H      L       1 1.720041 2.1144513  0.37792213 -6.365053
## 2      C      D   <NA>      M       2 1.836574 0.8826725 -0.15032402 -7.839813
## 3      C      D   <NA>      M       3 2.638177 2.2997237 -0.05205389 -7.976650
##        BETA
## 1 0.8358773
## 2 0.4895358
## 3 0.8022900

Example 1 - A simple example with 2 raters

We introduce a simple example with 2 raters (rater1 and rater2) rating a set of firms. The data set data_cr has a wide format and hence, we apply the multiple measurement object MMO2 on the left-hand side of the formula object. The firm-level covariates LR, LEV, PR, RSIZE, BETA are passed on the right-hand side of the formula. The thresholds are set equal by the argument thresholds.constraints.

res_cor_probit_2raters <- mvord(formula = MMO2(rater1, rater2) ~ 0 + LR + LEV + PR + RSIZE + BETA,
                               threshold.constraints = c(1, 1),
                               data = data_cr)

The results are displayed by:

summary(res_cor_probit_2raters)
## 
## Call: mvord(formula = MMO2(rater1, rater2) ~ 0 + LR + LEV + PR + RSIZE + 
##     BETA, data = data_cr, threshold.constraints = c(1, 1))
## 
## Formula: MMO2(rater1, rater2) ~ 0 + LR + LEV + PR + RSIZE + BETA
## 
##     link threshold nsubjects ndim   logPL   CLAIC   CLBIC fevals
## mvprobit  flexible       690    2 -799.84 1630.35 1699.91   4476
## 
## Thresholds:
##            Estimate Std. Error z value  Pr(>|z|)    
## rater1 A|B  8.40533    0.39600  21.225 < 2.2e-16 ***
## rater1 B|C  9.87574    0.42661  23.149 < 2.2e-16 ***
## rater1 C|D 11.68478    0.46471  25.145 < 2.2e-16 ***
## rater1 D|E 14.00771    0.53773  26.050 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##          Estimate Std. Error  z value  Pr(>|z|)    
## LR 1     0.191816   0.059864   3.2042  0.001354 ** 
## LR 2     0.089982   0.069426   1.2961  0.194945    
## LEV 1    0.456824   0.039714  11.5027 < 2.2e-16 ***
## LEV 2    0.441115   0.044323   9.9522 < 2.2e-16 ***
## PR 1    -2.626680   0.174938 -15.0149 < 2.2e-16 ***
## PR 2    -2.761407   0.203692 -13.5567 < 2.2e-16 ***
## RSIZE 1 -1.166859   0.050982 -22.8877 < 2.2e-16 ***
## RSIZE 2 -1.184520   0.050259 -23.5685 < 2.2e-16 ***
## BETA 1   1.644008   0.100520  16.3550 < 2.2e-16 ***
## BETA 2   1.741845   0.128536  13.5515 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error Structure:
##                    Estimate Std. Error z value  Pr(>|z|)    
## corr rater1 rater2 0.872025   0.025461   34.25 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficients can be extracted by:

coef(res_cor_probit_2raters)
##        LR 1        LR 2       LEV 1       LEV 2        PR 1        PR 2 
##  0.19181645  0.08998178  0.45682446  0.44111540 -2.62668045 -2.76140750 
##     RSIZE 1     RSIZE 2      BETA 1      BETA 2 
## -1.16685941 -1.18452031  1.64400778  1.74184468

Example 2 - A simple example with 3 raters

A simple example with 3 raters (rater1, rater2 and rater3) is presented. The regression coefficients are set equal by the argument coef.constraints.

res_cor_logit_3raters <- mvord(formula = MMO2(rater1, rater2, rater3) ~ 0 + 
                                 LR + LEV + PR + RSIZE + BETA,
                               coef.constraints = c(1, 1, 1),
                               data = data_cr,
                               link = mvlogit())

The results are displayed by:

summary(res_cor_logit_3raters)
## 
## Call: mvord(formula = MMO2(rater1, rater2, rater3) ~ 0 + LR + LEV + 
##     PR + RSIZE + BETA, data = data_cr, link = mvlogit(), coef.constraints = c(1, 
##     1, 1))
## 
## Formula: MMO2(rater1, rater2, rater3) ~ 0 + LR + LEV + PR + RSIZE + BETA
## 
##    link threshold nsubjects ndim   logPL   CLAIC   CLBIC fevals
## mvlogit  flexible       690    3 -1561.4 3194.64 3357.58   4323
## 
## Thresholds:
##            Estimate Std. Error z value  Pr(>|z|)    
## rater1 A|B 14.71274    0.84962  17.317 < 2.2e-16 ***
## rater1 B|C 17.46132    0.92699  18.837 < 2.2e-16 ***
## rater1 C|D 20.65829    1.03917  19.880 < 2.2e-16 ***
## rater1 D|E 24.68646    1.20698  20.453 < 2.2e-16 ***
## rater2 A|B 14.87020    0.85299  17.433 < 2.2e-16 ***
## rater2 B|C 17.43897    0.93096  18.732 < 2.2e-16 ***
## rater2 C|D 20.47189    1.04358  19.617 < 2.2e-16 ***
## rater2 D|E 24.80045    1.22174  20.299 < 2.2e-16 ***
## rater3 F|G 14.66330    0.86375  16.976 < 2.2e-16 ***
## rater3 G|H 17.35099    0.92572  18.743 < 2.2e-16 ***
## rater3 H|I 20.73810    1.04700  19.807 < 2.2e-16 ***
## rater3 I|J 23.25686    1.15044  20.216 < 2.2e-16 ***
## rater3 J|K 25.33849    1.22854  20.625 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##          Estimate Std. Error  z value  Pr(>|z|)    
## LR 1     0.333598   0.112624   2.9621  0.003056 ** 
## LEV 1    0.746847   0.076165   9.8057 < 2.2e-16 ***
## PR 1    -4.930269   0.356518 -13.8289 < 2.2e-16 ***
## RSIZE 1 -2.073426   0.108014 -19.1958 < 2.2e-16 ***
## BETA 1   3.021154   0.222735  13.5639 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error Structure:
##                    Estimate Std. Error z value  Pr(>|z|)    
## corr rater1 rater2 0.866603   0.027446  31.575 < 2.2e-16 ***
## corr rater1 rater3 0.903773   0.024725  36.554 < 2.2e-16 ***
## corr rater2 rater3 0.822655   0.045874  17.933 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The threshold parameters are presented by:

thresholds(res_cor_logit_3raters)
## $rater1
##      A|B      B|C      C|D      D|E 
## 14.71274 17.46132 20.65829 24.68646 
## 
## $rater2
##      A|B      B|C      C|D      D|E 
## 14.87020 17.43897 20.47189 24.80045 
## 
## $rater3
##      F|G      G|H      H|I      I|J      J|K 
## 14.66330 17.35099 20.73810 23.25686 25.33849

Example 3a - A model of firm ratings assigned by multiple raters and an investment-speculative grade indicator

This example presents a four dimensional ordinal regression model with probit link and a general correlation error structure cor_general(~ 1). The fourth rater only rates the firms on a binary scale: investment grade vs. speculative grade.

res_cor_probit_simple <- mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + LEV + PR + RSIZE + BETA, data = data_cr)

The results are displayed by:

summary(res_cor_probit_simple)
## 
## Call: mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + 
##     LEV + PR + RSIZE + BETA, data = data_cr)
## 
## Formula: MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + LEV + PR + RSIZE + 
##     BETA
## 
##     link threshold nsubjects ndim    logPL   CLAIC   CLBIC fevals
## mvprobit  flexible       690    4 -2925.83 6037.38 6458.64  15701
## 
## Thresholds:
##            Estimate Std. Error z value  Pr(>|z|)    
## rater1 A|B  8.04920    0.44306  18.167 < 2.2e-16 ***
## rater1 B|C  9.56783    0.47381  20.194 < 2.2e-16 ***
## rater1 C|D 11.34991    0.51747  21.933 < 2.2e-16 ***
## rater1 D|E 13.51638    0.60129  22.479 < 2.2e-16 ***
## rater2 A|B  8.59513    0.49834  17.247 < 2.2e-16 ***
## rater2 B|C 10.05454    0.53941  18.640 < 2.2e-16 ***
## rater2 C|D 11.85857    0.59749  19.847 < 2.2e-16 ***
## rater2 D|E 14.33336    0.70092  20.450 < 2.2e-16 ***
## rater3 F|G  8.24303    0.51775  15.921 < 2.2e-16 ***
## rater3 G|H  9.77659    0.55593  17.586 < 2.2e-16 ***
## rater3 H|I 11.70601    0.62343  18.777 < 2.2e-16 ***
## rater3 I|J 13.09380    0.68810  19.029 < 2.2e-16 ***
## rater3 J|K 14.17279    0.72153  19.643 < 2.2e-16 ***
## rater4 L|M 13.52691    1.00085  13.515 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##          Estimate Std. Error  z value  Pr(>|z|)    
## LR 1     0.208652   0.068020   3.0675  0.002159 ** 
## LR 2     0.153212   0.073420   2.0868  0.036906 *  
## LR 3     0.180531   0.078643   2.2956  0.021700 *  
## LR 4     0.150887   0.111750   1.3502  0.176944    
## LEV 1    0.430213   0.043774   9.8281 < 2.2e-16 ***
## LEV 2    0.432842   0.050129   8.6346 < 2.2e-16 ***
## LEV 3    0.399459   0.050826   7.8594 3.861e-15 ***
## LEV 4    0.624967   0.074146   8.4289 < 2.2e-16 ***
## PR 1    -2.573616   0.194046 -13.2629 < 2.2e-16 ***
## PR 2    -2.827056   0.217061 -13.0242 < 2.2e-16 ***
## PR 3    -2.678170   0.222705 -12.0257 < 2.2e-16 ***
## PR 4    -2.793063   0.279732  -9.9848 < 2.2e-16 ***
## RSIZE 1 -1.129967   0.056379 -20.0424 < 2.2e-16 ***
## RSIZE 2 -1.196432   0.061776 -19.3671 < 2.2e-16 ***
## RSIZE 3 -1.196481   0.066503 -17.9914 < 2.2e-16 ***
## RSIZE 4 -1.565903   0.115749 -13.5285 < 2.2e-16 ***
## BETA 1   1.602187   0.110868  14.4513 < 2.2e-16 ***
## BETA 2   1.801526   0.140123  12.8567 < 2.2e-16 ***
## BETA 3   1.517633   0.139319  10.8932 < 2.2e-16 ***
## BETA 4   1.988644   0.203338   9.7800 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error Structure:
##                    Estimate Std. Error z value  Pr(>|z|)    
## corr rater1 rater2 0.875996   0.024554  35.677 < 2.2e-16 ***
## corr rater1 rater3 0.913588   0.023462  38.940 < 2.2e-16 ***
## corr rater1 rater4 0.896170   0.033495  26.755 < 2.2e-16 ***
## corr rater2 rater3 0.831679   0.042740  19.459 < 2.2e-16 ***
## corr rater2 rater4 0.926139   0.031808  29.117 < 2.2e-16 ***
## corr rater3 rater4 0.866521   0.051285  16.896 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The threshold coefficients can be extracted by:

thresholds(res_cor_probit_simple)
## $rater1
##       A|B       B|C       C|D       D|E 
##  8.049203  9.567831 11.349908 13.516375 
## 
## $rater2
##       A|B       B|C       C|D       D|E 
##  8.595126 10.054536 11.858567 14.333362 
## 
## $rater3
##       F|G       G|H       H|I       I|J       J|K 
##  8.243027  9.776591 11.706012 13.093799 14.172787 
## 
## $rater4
##      L|M 
## 13.52691

The regression coefficients are obtained by:

coef(res_cor_probit_simple)
##       LR 1       LR 2       LR 3       LR 4      LEV 1      LEV 2      LEV 3 
##  0.2086517  0.1532119  0.1805309  0.1508872  0.4302126  0.4328417  0.3994591 
##      LEV 4       PR 1       PR 2       PR 3       PR 4    RSIZE 1    RSIZE 2 
##  0.6249669 -2.5736158 -2.8270564 -2.6781697 -2.7930627 -1.1299673 -1.1964316 
##    RSIZE 3    RSIZE 4     BETA 1     BETA 2     BETA 3     BETA 4 
## -1.1964814 -1.5659034  1.6021871  1.8015260  1.5176327  1.9886441

The error structure for firm with firm_id = 11 is displayed by:

error_structure(res_cor_probit_simple)[[11]]
##           rater1    rater2    rater3    rater4
## rater1 1.0000000 0.8759965 0.9135876 0.8961703
## rater2 0.8759965 1.0000000 0.8316790 0.9261393
## rater3 0.9135876 0.8316790 1.0000000 0.8665213
## rater4 0.8961703 0.9261393 0.8665213 1.0000000

Example 3b - A more elaborate model of ratings assigned by multiple raters

In this example, we extend the setting of the above example by imposing constraints on the regression as well as on the threshold parameters and changing the link function to the multivariate logit link.

res_cor_logit <- mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~
    0 + LR + LEV + PR + RSIZE + BETA, data = data_cr, link = mvlogit(),
    coef.constraints = cbind(LR = c(1, 1, 1, 1), 
                             LEV = c(1, 2, 3, 4),
                             PR = c(1, 1, 1, 1), 
                             RSIZE = c(1, 1, 1, 2), 
                             BETA = c(1, 1, 2, 3)),
    threshold.constraints = c(1, 1, 2, 3))

The results are displayed by:

summary(res_cor_logit)
## 
## Call: mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + 
##     LEV + PR + RSIZE + BETA, data = data_cr, link = mvlogit(), 
##     coef.constraints = cbind(c(1, 1, 1, 1), c(1, 2, 3, 4), c(1, 
##         1, 1, 1), c(1, 1, 1, 2), c(1, 1, 2, 3)), threshold.constraints = c(1, 
##         1, 2, 3))
## 
## Formula: MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + LEV + PR + RSIZE + 
##     BETA
## 
##    link threshold nsubjects ndim    logPL   CLAIC   CLBIC fevals
## mvlogit  flexible       690    4 -2926.42 5987.81 6293.98   9338
## 
## Thresholds:
##            Estimate Std. Error z value  Pr(>|z|)    
## rater1 A|B 15.04918    0.82409  18.262 < 2.2e-16 ***
## rater1 B|C 17.75218    0.89728  19.785 < 2.2e-16 ***
## rater1 C|D 20.97821    1.00773  20.817 < 2.2e-16 ***
## rater1 D|E 25.13047    1.17487  21.390 < 2.2e-16 ***
## rater3 F|G 14.47061    0.83922  17.243 < 2.2e-16 ***
## rater3 G|H 17.17327    0.89515  19.185 < 2.2e-16 ***
## rater3 H|I 20.56634    1.01119  20.339 < 2.2e-16 ***
## rater3 I|J 23.00523    1.11045  20.717 < 2.2e-16 ***
## rater3 J|K 24.97258    1.18725  21.034 < 2.2e-16 ***
## rater4 L|M 23.92770    1.63196  14.662 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##          Estimate Std. Error  z value  Pr(>|z|)    
## LR 1     0.340210   0.110547   3.0775  0.002087 ** 
## LEV 1    0.784294   0.075977  10.3228 < 2.2e-16 ***
## LEV 2    0.779694   0.078364   9.9496 < 2.2e-16 ***
## LEV 3    0.718329   0.093425   7.6888 1.485e-14 ***
## LEV 4    1.107836   0.123681   8.9572 < 2.2e-16 ***
## PR 1    -4.917963   0.343464 -14.3187 < 2.2e-16 ***
## RSIZE 1 -2.093378   0.103690 -20.1889 < 2.2e-16 ***
## RSIZE 2 -2.746163   0.188731 -14.5507 < 2.2e-16 ***
## BETA 1   3.135692   0.221944  14.1283 < 2.2e-16 ***
## BETA 2   2.733088   0.252960  10.8044 < 2.2e-16 ***
## BETA 3   3.572691   0.349493  10.2225 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error Structure:
##                    Estimate Std. Error z value  Pr(>|z|)    
## corr rater1 rater2 0.859776   0.027907  30.809 < 2.2e-16 ***
## corr rater1 rater3 0.908834   0.024636  36.891 < 2.2e-16 ***
## corr rater1 rater4 0.903956   0.031858  28.374 < 2.2e-16 ***
## corr rater2 rater3 0.834904   0.044259  18.864 < 2.2e-16 ***
## corr rater2 rater4 0.932242   0.032173  28.976 < 2.2e-16 ***
## corr rater3 rater4 0.856234   0.058392  14.664 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The constraints are displayed by:

constraints(res_cor_logit)
## $LR
##     LR 1
## A|B    1
## B|C    1
## C|D    1
## D|E    1
## A|B    1
## B|C    1
## C|D    1
## D|E    1
## F|G    1
## G|H    1
## H|I    1
## I|J    1
## J|K    1
## L|M    1
## 
## $LEV
##     LEV 1 LEV 2 LEV 3 LEV 4
## A|B     1     0     0     0
## B|C     1     0     0     0
## C|D     1     0     0     0
## D|E     1     0     0     0
## A|B     0     1     0     0
## B|C     0     1     0     0
## C|D     0     1     0     0
## D|E     0     1     0     0
## F|G     0     0     1     0
## G|H     0     0     1     0
## H|I     0     0     1     0
## I|J     0     0     1     0
## J|K     0     0     1     0
## L|M     0     0     0     1
## 
## $PR
##     PR 1
## A|B    1
## B|C    1
## C|D    1
## D|E    1
## A|B    1
## B|C    1
## C|D    1
## D|E    1
## F|G    1
## G|H    1
## H|I    1
## I|J    1
## J|K    1
## L|M    1
## 
## $RSIZE
##     RSIZE 1 RSIZE 2
## A|B       1       0
## B|C       1       0
## C|D       1       0
## D|E       1       0
## A|B       1       0
## B|C       1       0
## C|D       1       0
## D|E       1       0
## F|G       1       0
## G|H       1       0
## H|I       1       0
## I|J       1       0
## J|K       1       0
## L|M       0       1
## 
## $BETA
##     BETA 1 BETA 2 BETA 3
## A|B      1      0      0
## B|C      1      0      0
## C|D      1      0      0
## D|E      1      0      0
## A|B      1      0      0
## B|C      1      0      0
## C|D      1      0      0
## D|E      1      0      0
## F|G      0      1      0
## G|H      0      1      0
## H|I      0      1      0
## I|J      0      1      0
## J|K      0      1      0
## L|M      0      0      1

Note that the composite likelihood information criteria can be used for model comparison. For objects of class mvord the composite likelihood AIC and BIC are computed by AIC() and BIC(), respectively. The model fits of examples one and two are compared by means of BIC and AIC. We observe that the model of Example 3b has a lower BIC and AIC indicating a better model fit:

BIC(res_cor_probit_simple, res_cor_logit)
##                             df      BIC
## res_cor_probit_simple 92.85637 6458.641
## res_cor_logit         67.48687 6293.977
AIC(res_cor_probit_simple)
## [1] 6037.38
AIC(res_cor_logit)
## [1] 5987.81

The value of the pairwise log-likelihood of the two models can be extracted by:

logLik(res_cor_probit_simple)
## 'log Lik.' -2925.834 (df=92.85637)
logLik(res_cor_logit)
## 'log Lik.' -2926.418 (df=67.48687)

Marginal predictions are obtained by:

mp <- marginal_predict(res_cor_logit, type = "class")
table(data_cr$rater1, mp$rater1)
##    
##       A   B   C   D   E
##   A  47  23   4   0   0
##   B  11  65  44   3   0
##   C   0  28 135  38   0
##   D   1   2  44 116  17
##   E   0   0   0  30  47
table(data_cr$rater2, mp$rater2)
##    
##       A   B   C   D   E
##   A  29  20   8   0   0
##   B  12  41  22   2   0
##   C   1  18 103  24   0
##   D   0   0  35 109   7
##   E   0   0   0  13  39
table(data_cr$rater3, mp$rater3)
##    
##      F  G  H  I  J  K
##   F 25  8  2  0  0  0
##   G  4 33 23  1  0  0
##   H  1 18 85 13  0  0
##   I  0  0 28 30  3  2
##   J  0  0  2 13 10  7
##   K  0  0  0  6  8 23
table(data_cr$rater4, mp$rater4)
##    
##       L   M
##   L 195  40
##   M  30 425

The joint probabilities of observing the in-sample responses are obtained by:

jp <- predict(res_cor_logit, type = "prob")
head(jp)
##          1          2          3          4          5          6 
## 0.03600335 0.06936164 0.06286522 0.46162968 0.95174028 0.54502333

If the probabilities for different class combinations than the observed ones should be computed, function joint_probabilities() can be used:

jp_AAFL <- joint_probabilities(res_cor_logit, 
                          response.cat = c("A", "A", "F", "L"),  
                          type = "prob")
head(jp_AAFL)
##            1            2            3            4            5            6 
## 1.288053e-01 2.438076e-03 2.736990e-04 9.255466e-02 3.067605e-07 6.807510e-04

For example, it might be interesting to look at the probabilities of rater 1 and rater 2 assigning the same rating when rater 3 assigns an “F” and rater 4 an “L”. This can be obtained by :

nc <- 5 # number of classes for rater 1 and 2
jp_mat <- matrix(nrow = nrow(data_cr), ncol = nc)
for (cl in 1:nc) {
  jp_mat[, cl] <- joint_probabilities(
    res_cor_logit, 
    response.cat = c(LETTERS[cl], LETTERS[cl], "F", "L"),  
    type = "prob")
}
jp2 <- rowSums(jp_mat)
head(jp2)
## [1] 1.528101e-01 3.344406e-03 4.275890e-04 1.208787e-01 3.939843e-07
## [6] 9.857712e-04

In order to predict the most likely combination of ratings based on the model, the predict() function with type = "class" can be used. * Note that this is take considerable time to run, as all combinations of responses (in this example 300=5532) for the four responses should be considered.

jpred <- predict(res_cor_logit, type = "class")
table(data_cr$rater1, jpred$rater1)
##    
##       A   B   C   D   E
##   A  50  20   4   0   0
##   B  11  68  41   3   0
##   C   0  36 130  35   0
##   D   1   2  47 112  18
##   E   0   0   0  25  52
table(data_cr$rater2, jpred$rater2)
##    
##       A   B   C   D   E
##   A  33  17   7   0   0
##   B  12  46  17   2   0
##   C   1  24  99  22   0
##   D   0   0  39 103   9
##   E   0   0   0  10  42
table(data_cr$rater3, jpred$rater3)
##    
##      F  G  H  I  J  K
##   F 27  7  1  0  0  0
##   G  6 33 21  1  0  0
##   H  1 24 77 15  0  0
##   I  0  0 25 35  1  2
##   J  0  0  2 17  3 10
##   K  0  0  0  9  1 27
table(data_cr$rater4, jpred$rater4)
##    
##       L   M
##   L 175  60
##   M  22 433

Example 4 - A longitudinal model

In this example, we present a longitudinal multivariate ordinal probit regression model with a covariate dependent AR(1) error structure using the data set data_cr_panel:

data("data_cr_panel", package = "mvord")
str(data_cr_panel, vec.len = 3)
## 'data.frame':    11320 obs. of  9 variables:
##  $ rating : Ord.factor w/ 5 levels "A"<"B"<"C"<"D"<..: 5 3 3 3 3 1 3 3 ...
##  $ firm_id: int  1 2 3 4 5 6 7 8 ...
##  $ year   : Factor w/ 8 levels "year1","year2",..: 1 1 1 1 1 1 1 1 ...
##  $ LR     : num  572.86 1.38 7.46 10.9 ...
##  $ LEV    : num  1.2008 0.0302 0.1517 0.5485 ...
##  $ PR     : num  0.1459 -0.0396 0.0508 0.1889 ...
##  $ RSIZE  : num  1.423 -1.944 2.024 -0.433 ...
##  $ BETA   : num  1.148 1.693 0.761 2.24 ...
##  $ BSEC   : Factor w/ 8 levels "BSEC1","BSEC2",..: 3 6 3 7 6 7 7 7 ...
head(data_cr_panel, n = 3)
##   rating firm_id  year         LR        LEV          PR     RSIZE      BETA
## 1      E       1 year1 572.864658 1.20084294  0.14585117  1.422948 1.1481020
## 2      C       2 year1   1.379547 0.03022761 -0.03962597 -1.944265 1.6926956
## 3      C       3 year1   7.462706 0.15170420  0.05083517  2.024098 0.7610057
##    BSEC
## 1 BSEC3
## 2 BSEC6
## 3 BSEC3
res_AR1_probit <- mvord(formula = MMO(rating, firm_id, year) ~ LR + LEV +
  PR + RSIZE + BETA, 
  error.structure = cor_ar1(~ BSEC), link = mvprobit(),
  data = data_cr_panel, 
  coef.constraints = c(rep(1, 4), rep(2, 4)),
  threshold.constraints = rep(1, 8), 
  threshold.values = rep(list(c(0, NA, NA, NA)),8),
  control = mvord.control(solver = "BFGS"))

The results of the model can be presented by

summary(res_AR1_probit, short = TRUE, call = FALSE)
## Formula: MMO(rating, firm_id, year) ~ LR + LEV + PR + RSIZE + BETA
## 
##     link threshold nsubjects ndim     logPL     CLAIC     CLBIC fevals
## mvprobit fix1first      1415    8 -74805.56 150098.19 151377.93    181
## 
## Thresholds:
##           Estimate Std. Error z value  Pr(>|z|)    
## year1 A|B 0.000000   0.000000      NA        NA    
## year1 B|C 1.016137   0.026379  38.521 < 2.2e-16 ***
## year1 C|D 2.444887   0.041170  59.385 < 2.2e-16 ***
## year1 D|E 3.891073   0.058843  66.127 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##                 Estimate Std. Error  z value  Pr(>|z|)    
## (Intercept) 1  1.4190682  0.0532243  26.6620 < 2.2e-16 ***
## (Intercept) 2  1.4975500  0.0494038  30.3124 < 2.2e-16 ***
## LR 1           0.0193546  0.0006938  27.8966 < 2.2e-16 ***
## LR 2           0.0323230  0.0010102  31.9955 < 2.2e-16 ***
## LEV 1          0.0268125  0.0019263  13.9191 < 2.2e-16 ***
## LEV 2          0.0180050  0.0013143  13.6993 < 2.2e-16 ***
## PR 1          -1.0930080  0.0427073 -25.5930 < 2.2e-16 ***
## PR 2          -0.7411681  0.0372295 -19.9081 < 2.2e-16 ***
## RSIZE 1       -0.3676665  0.0114038 -32.2407 < 2.2e-16 ***
## RSIZE 2       -0.3608979  0.0115297 -31.3016 < 2.2e-16 ***
## BETA 1         0.0541624  0.0296028   1.8296    0.0673 .  
## BETA 2         0.1099998  0.0236754   4.6462 3.382e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error Structure:
##              Estimate Std. Error  z value  Pr(>|z|)    
## (Intercept)  1.490395   0.051540  28.9173 < 2.2e-16 ***
## BSECBSEC2   -0.552317   0.069241  -7.9767 1.503e-15 ***
## BSECBSEC3   -0.062028   0.066553  -0.9320    0.3513    
## BSECBSEC4   -0.085592   0.065126  -1.3142    0.1888    
## BSECBSEC5   -0.066598   0.085692  -0.7772    0.4371    
## BSECBSEC6   -0.683429   0.069184  -9.8785 < 2.2e-16 ***
## BSECBSEC7   -0.863911   0.064855 -13.3206 < 2.2e-16 ***
## BSECBSEC8   -0.757997   0.078506  -9.6553 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The default error structure output for AR(1) models:

error_structure(res_AR1_probit)
##         V16         V17         V18         V19         V20         V21 
##  1.49039489 -0.55231725 -0.06202790 -0.08559155 -0.06659849 -0.68342920 
##         V22         V23 
## -0.86391149 -0.75799729

Correlation parameters \(\rho_i\) for each firm are obtained by choosing type = "corr" in error_structure():

head(error_structure(res_AR1_probit, type = "corr"), n = 3)
##   Correlation
## 1   0.8913315
## 2   0.6679130
## 3   0.8913315

Correlation matrices for each specific firm are obtained by choosing type = "sigmas":

head(error_structure(res_AR1_probit, type = "sigmas"), n = 1)
## $`1`
##           year1     year2     year3     year4     year5     year6     year7
## year1 1.0000000 0.8913315 0.7944718 0.7081377 0.6311854 0.5625954 0.5014590
## year2 0.8913315 1.0000000 0.8913315 0.7944718 0.7081377 0.6311854 0.5625954
## year3 0.7944718 0.8913315 1.0000000 0.8913315 0.7944718 0.7081377 0.6311854
## year4 0.7081377 0.7944718 0.8913315 1.0000000 0.8913315 0.7944718 0.7081377
## year5 0.6311854 0.7081377 0.7944718 0.8913315 1.0000000 0.8913315 0.7944718
## year6 0.5625954 0.6311854 0.7081377 0.7944718 0.8913315 1.0000000 0.8913315
## year7 0.5014590 0.5625954 0.6311854 0.7081377 0.7944718 0.8913315 1.0000000
## year8 0.4469662 0.5014590 0.5625954 0.6311854 0.7081377 0.7944718 0.8913315
##           year8
## year1 0.4469662
## year2 0.5014590
## year3 0.5625954
## year4 0.6311854
## year5 0.7081377
## year6 0.7944718
## year7 0.8913315
## year8 1.0000000

Example 5 - Self-reported health status

In this example we analyze longitudinal data on self-reported health status measured on a five point ordinal scale (5 = poor, 4 = fair, 3 = good, 2 = very good and 1 = excellent) derived from the Health and Retirement Study conducted by the University of Michigan which is available in the LMest package:

data(data_SRHS_long, package = "LMest")
str(data_SRHS_long)
## 'data.frame':    56592 obs. of  7 variables:
##  $ t        : int  1 2 3 4 5 6 7 8 1 2 ...
##  $ id       : num  1 1 1 1 1 1 1 1 2 2 ...
##  $ gender   : num  1 1 1 1 1 1 1 1 2 2 ...
##  $ race     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ education: num  3 3 3 3 3 3 3 3 5 5 ...
##  $ age      : num  56 58 60 62 64 66 68 70 54 55 ...
##  $ srhs     : num  4 4 3 3 4 3 3 3 3 3 ...

The dataset contains a self-reported health status srhs together with covariates such as ethnicity (race coded as 1 = white, 2 = black, 3 = others), gender (coded as 1 = male, 2 = female), education level (coded as 1 = high school, 2 = general educational diploma, 3 = high school graduate, 4 = some college, 5 = college and above) and age for 7074 subjects on 8 different (approximately equally spaced) time occasions `t``.

We estimate a multivariate ordinal logit model similar to the one of the models employed by Bartolucci, Pandolfi, and Pennoni (2017), where for every subject in the sample the errors follow an \(AR(1)\) process. Moreover, the threshold and regression coefficients are equal across all years. In order to reduce the computational burden we only consider pairs of observations not more than two time points appart by setting PL.lag = 2.

res_srhs <- mvord(formula = MMO(srhs, id, t) ~ 0 + factor(gender) +
    factor(race) + factor(education) + age, 
    data = data_SRHS_long,
    threshold.constraints = rep(1, 8), 
    coef.constraints = rep(1, 8),
    error.structure = cor_ar1(~ 1), link = mvlogit(), 
    PL.lag = 2)

(runtime 12 minutes).

The persistence in the reported health status is high. The correlation parameter in the cor_ar1 error structure is

unique(error_structure(res_srhs, type = "corr"))
##   Correlation
## 1   0.8820163

The estimated parameters are shown by the function summary():

summary(res_srhs, call = FALSE)
## Formula: MMO(srhs, id, time) ~ 0 + factor(gender) + factor(race) + factor(education) + 
##     age
## 
##    link threshold nsubjects ndim      logPL     CLAIC     CLBIC fevals
## mvlogit  flexible      7074    8 -239860.22 479933.65 480665.42    404
## 
## Thresholds:
##       Estimate Std. Error  z value Pr(>|z|)    
## 1 1|2 -1.78119    0.11595 -15.3621   <2e-16 ***
## 1 2|3 -0.11563    0.11679  -0.9901   0.3221    
## 1 3|4  1.40676    0.11857  11.8644   <2e-16 ***
## 1 4|5  3.03358    0.12245  24.7741   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##                        Estimate Std. Error  z value  Pr(>|z|)    
## factor(gender)2 1    -0.0059763  0.0382602  -0.1562    0.8759    
## factor(race)2 1       0.6083505  0.0560734  10.8492 < 2.2e-16 ***
## factor(race)3 1       0.5050706  0.0994654   5.0779 3.817e-07 ***
## factor(education)2 1 -0.5344365  0.0889470  -6.0085 1.873e-09 ***
## factor(education)3 1 -1.0114449  0.0516341 -19.5887 < 2.2e-16 ***
## factor(education)4 1 -1.3037171  0.0583832 -22.3304 < 2.2e-16 ***
## factor(education)5 1 -1.7340257  0.0603490 -28.7333 < 2.2e-16 ***
## age 1                 0.0161467  0.0016881   9.5649 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error Structure:
##              Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) 1.3847764  0.0088376  156.69 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the logit model the coefficients can be interpreted in terms of log-odds ratios. The results suggest that being non-white increases the chances of reporting a worse health status while higher people with education levels tend to report better health. Moreover, every additional year increases the odds of reporting a worse health status by 1.64% (\(exp(0.0162628)=1.016396\)).

Example 6 - Multirater agreement data

The mvord package can also be employed to measure agreement among several rating sources either with or without additional relevant covariate information.

We use the multirater agreement data from Chapter 5 in Johnson and Albert (1999). These data consist of grades assigned to 198 essays by 5 experts, each of whom rated all essays on a 10-point scale. A score of 10 indicates an excellent essay. In addition, the average word length is also available as an essay characteristic. The data set can be loaded from the package:

data("essay_data", package = "mvord")

The traditional measure of agreement between raters in the social sciences is the polychoric correlation. The polychoric correlation can be assessed using the mvord package by estimating of a model with no covariates and probit link. The correlation parameters of the cor_general error structure are to be interpreted as the measure of agreement.

head(essay_data)
##   Judge1 Judge2 Judge3 Judge4 Judge5    wl
## 1      8      6      9     10      5 4.760
## 2      7      5      3      5      3 4.244
## 3      2      1      1      1      1 4.087
## 4      5      3      4      5      3 4.355
## 5      7      2      7      7      7 4.306
## 6     10      6      8     10      6 4.508

The data is in the wide format (each ordinal response is in one column) so the MMO2 object will be used in the formula object:

res_essay_0 <- mvord(
  formula = MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ -1,
  data = essay_data, threshold.constraints = rep(1, 5),
  coef.constraints = rep(1, 5))

(runtime 14 seconds).

summary(res_essay_0, call = FALSE)
## Formula: MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ -1
## 
##     link threshold nsubjects ndim    logPL    CLAIC    CLBIC fevals
## mvprobit  flexible       198    5 -8528.42 17172.01 17361.38   1443
## 
## Thresholds:
##               Estimate Std. Error  z value  Pr(>|z|)    
## Judge1 1|2  -1.2448340  0.0795063 -15.6571 < 2.2e-16 ***
## Judge1 2|3  -0.6634138  0.0676890  -9.8009 < 2.2e-16 ***
## Judge1 3|4  -0.3297115  0.0651692  -5.0593 4.208e-07 ***
## Judge1 4|5   0.0001387  0.0632803   0.0022    0.9983    
## Judge1 5|6   0.3496646  0.0654689   5.3409 9.247e-08 ***
## Judge1 6|7   0.6352872  0.0668804   9.4989 < 2.2e-16 ***
## Judge1 7|8   0.9937520  0.0740965  13.4116 < 2.2e-16 ***
## Judge1 8|9   1.3344835  0.0806365  16.5494 < 2.2e-16 ***
## Judge1 9|10  1.9196999  0.1064901  18.0270 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##      Estimate Std. Error z value Pr(>|z|)
## 
## Error Structure:
##                    Estimate Std. Error z value  Pr(>|z|)    
## corr Judge1 Judge2 0.476080   0.061835  7.6992 1.369e-14 ***
## corr Judge1 Judge3 0.569194   0.046682 12.1930 < 2.2e-16 ***
## corr Judge1 Judge4 0.645198   0.046595 13.8468 < 2.2e-16 ***
## corr Judge1 Judge5 0.322528   0.090319  3.5710 0.0003556 ***
## corr Judge2 Judge3 0.396377   0.056243  7.0475 1.821e-12 ***
## corr Judge2 Judge4 0.634422   0.046930 13.5186 < 2.2e-16 ***
## corr Judge2 Judge5 0.621900   0.041815 14.8725 < 2.2e-16 ***
## corr Judge3 Judge4 0.391911   0.072272  5.4227 5.871e-08 ***
## corr Judge3 Judge5 0.382999   0.066020  5.8013 6.582e-09 ***
## corr Judge4 Judge5 0.693040   0.041727 16.6087 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The word length can be included as a covariate in the model:

res_essay_wl <- mvord(
  formula = MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ 0 + wl,
  data = essay_data, threshold.constraints = rep(1, 5),
  coef.constraints = rep(1, 5))

(runtime 0 minutes).

summary(res_essay_wl, call = FALSE)
## Formula: MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ 0 + wl
## 
##     link threshold nsubjects ndim    logPL    CLAIC    CLBIC fevals
## mvprobit  flexible       198    5 -8488.74 17110.73 17329.81   2161
## 
## Thresholds:
##             Estimate Std. Error z value  Pr(>|z|)    
## Judge1 1|2   1.09799    0.78039  1.4070 0.1594326    
## Judge1 2|3   1.68728    0.78221  2.1571 0.0309994 *  
## Judge1 3|4   2.02523    0.78287  2.5869 0.0096838 ** 
## Judge1 4|5   2.36059    0.78294  3.0150 0.0025694 ** 
## Judge1 5|6   2.71667    0.78472  3.4620 0.0005362 ***
## Judge1 6|7   3.00649    0.78550  3.8275 0.0001295 ***
## Judge1 7|8   3.36963    0.78891  4.2712 1.944e-05 ***
## Judge1 8|9   3.71636    0.79304  4.6862 2.783e-06 ***
## Judge1 9|10  4.31272    0.79820  5.4031 6.551e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Coefficients:
##      Estimate Std. Error z value Pr(>|z|)   
## wl 1  0.54193    0.17837  3.0382  0.00238 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error Structure:
##                    Estimate Std. Error z value  Pr(>|z|)    
## corr Judge1 Judge2 0.459154   0.062913  7.2983 2.915e-13 ***
## corr Judge1 Judge3 0.558919   0.047414 11.7881 < 2.2e-16 ***
## corr Judge1 Judge4 0.631977   0.048175 13.1182 < 2.2e-16 ***
## corr Judge1 Judge5 0.295951   0.089837  3.2943 0.0009866 ***
## corr Judge2 Judge3 0.382554   0.056902  6.7230 1.780e-11 ***
## corr Judge2 Judge4 0.621086   0.048802 12.7266 < 2.2e-16 ***
## corr Judge2 Judge5 0.608996   0.043499 14.0004 < 2.2e-16 ***
## corr Judge3 Judge4 0.374731   0.073984  5.0651 4.083e-07 ***
## corr Judge3 Judge5 0.368174   0.067220  5.4772 4.322e-08 ***
## corr Judge4 Judge5 0.681338   0.044116 15.4442 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The probabilities of agreement among all five judges which implied by the model can be computed by the function joint_probabilities():

agree_prob_list <- lapply(1:10, function(i)
  joint_probabilities(res_essay_wl, rep(i, 5)))
agree_prob <- Reduce("+", agree_prob_list)
summary(agree_prob)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.005150 0.006445 0.008724 0.009701 0.011795 0.023869

In order to assess the relationship between the average word length and the agreement probabilities, we plot the probabilities of agreement implied by the model against the word length.

plot(essay_data$wl, agree_prob,
  xlab = "word length", ylab = "probability of agreement")

The graphic suggests that the judges tend to agree more on the quality of essays with lower average word length than on the essays with larger average word length.

References

Bartolucci, Francesco, Silvia Pandolfi, and Fulvia Pennoni. 2017. LMest: An R Package for Latent Markov Models for Longitudinal Categorical Data.” Journal of Statistical Software 81 (4): 1–38. https://doi.org/10.18637/jss.v081.i04.
Johnson, Valen E, and J Albert. 1999. Ordinal Data Modeling. Statistics for Social Science and Public Policy. Springer-Verlag New York Incorporated.