Panel Count Models with Random Effects and Sample Selection

1. Introduction

Panel count data are ubiquitous, such as the sales of products month by month and the views of videos day by day. There are two common issues with modeling panel count data:

Repeated observations. The observations on the same individuals are unlikely to be independent because of individual level unobserved effects.
Sample selection. The counts are often only observed for a selective sample of individuals. For example, our data may only include a subset of products or videos that are not randomly selected from the population.

The PanelCount package implements multiple models to address both issues. Specifically, it supports the estimation of the following models:

PoissonRE: Poisson model with individual level random effects
PLN_RE: Poisson log-normal model with individual level random effects. That is, a Poisson model with random effects at both the individual and individual-time levels
ProbitRE: Probit model with individual level random effects
ProbitRE_PoissonRE: ProbitRE and PoissonRE models with correlated individual level random effects
ProbitRE_PLNRE: ProbitRE and PLN_RE models with correlated individual level random effects and correlated individual-time level error terms

2. Models

2.1. PoissonRE: Poisson model with individual level Random Effects

Let \(i\) and \(t\) index individual and time, respectively. The conditional mean of the PoissonRE model is specified as follows:

\[E[y_{it}|x_{it},v_i] = exp(\boldsymbol{\beta}\mathbf{x_{it}}' + \sigma v_i)\]

where \(x_{it}\) represents the set of covariates influencing the outcome \(y_{it}\), and \(v_i\) denotes the individual level random effects and is assumed to follow the standard normal distribution. \(\sigma^2\) represents the variance of the random effect.

2.2. PLN_RE: Poisson LogNormal model with individual level Random Effects

The conditional mean of the PLN_RE model is specified as follows:

\[E[y_{it}|x_{it},v_i,\epsilon_{it}] = exp( \boldsymbol{\beta}\mathbf{x_{it}}' + \sigma v_i + \gamma \epsilon_{it})\]

where \(v_i\) represents individual random effects and \(\epsilon_{it}\) represents individual-time level random effects. Both are assumed to follow a standard normal distribution. \(\sigma^2\) and \(\gamma^2\) represent the variances of the individual and individual-time level random effects, respectively.

2.3. ProbitRE: Probit model with individual level Random Effects

The specification of the ProbitRE model is given by\[z_{it}=1(\boldsymbol{\alpha}\mathbf{w_{it}}'+\delta u_i+\xi_{it} > 0)\]

where \(w_{it}\) represents the set of covariates influencing individual \(i\)’s decision in period \(t\), and where \(u_i\) represents the individual level random effect following the standard normal distribution, with the variance of the random effect captured by \(\delta^2\). The variance of the individual-time level random shock \(\xi_{it}\) is normalized to 1 to ensure unique identification.

2.4. ProbitRE_PoissonRE

This model estimates the following selection and outcome equations jontly, allowing the random effects at the individual level to be correlated.

Selection Equation (ProbitRE): \[z_{it}=1(\boldsymbol{\alpha}\mathbf{w_{it}}'+\delta u_i+\xi_{it} > 0)\]

Outcome Equation (PoissonRE): \[E[y_{it}|x_{it},v_i] = exp(\boldsymbol{\beta}\mathbf{x_{it}}' + \sigma v_i)\]

Sample Selection at individual level: \[\begin{pmatrix} u_i \\ v_i \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\right). \]

2.5. ProbitRE_PLNRE

This model estimates the following selection and outcome equations jontly, allowing the random effects (or error terms) at both the individual and individual-time level to be respectively correlated.

Selection Equation (ProbitRE): \[z_{it}=1(\boldsymbol{\alpha}\mathbf{w_{it}}'+\delta u_i+\xi_{it} > 0)\]

Outcome Equation (PLN_RE): \[E[y_{it}|x_{it},v_i,\epsilon_{it}] = exp(\boldsymbol{\beta}\mathbf{x_{it}}' + \sigma v_i + \gamma \epsilon_{it})\]

Sample Selection at individual level: \[\begin{pmatrix} u_i \\ v_i \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\right). \]

Sample Selection at individual-time level: \[\begin{pmatrix} \xi_{it} \\ \epsilon_{it} \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & \tau \\ \tau & 1 \end{pmatrix}\right). \]

3. Examples

We begin with simulating a dataset with 200 invidiuals and 10 periods using the following data generating process (DGP):

\[z_{it}=1(1+x_{it}+w_{it}+u_i+\xi_{it} > 0)\]

\[E[y_{it}|x_{it},v_i,\epsilon_{it}] = exp(-1+x_{it} + v_i + \epsilon_{it})\]

\[\begin{pmatrix} u_i \\ v_i \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & 0.25 \\ 0.25 & 1 \end{pmatrix}\right). \]

\[\begin{pmatrix} \xi_{it} \\ \epsilon_{it} \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & 0.5 \\ 0.5 & 1 \end{pmatrix}\right). \]

library(MASS)
library(PanelCount)
set.seed(1)
N = 200
periods = 10
rho = 0.25
tau = 0.5

id = rep(1:N, each=periods)
time = rep(1:periods, N)
x = rnorm(N*periods)
w = rnorm(N*periods)

# correlated random effects at the individual level
r = mvrnorm(N, mu=c(0,0), Sigma=matrix(c(1,rho,rho,1), nrow=2))
r1 = rep(r[,1], each=periods)
r2 = rep(r[,2], each=periods)

# correlated error terms at the individual-time level
e = mvrnorm(N*periods, mu=c(0,0), Sigma=matrix(c(1,tau,tau,1), nrow=2))
e1 = e[,1]
e2 = e[,2]

# selection
z = as.numeric(1+x+w+r1+e1>0)
# outcome
y = rpois(N*periods, exp(-1+x+r2+e2))
y[z==0] = NA
sim = data.frame(id,time,x,w,z,y)
head(sim)
#>   id time          x           w z  y
#> 1  1    1 -0.6264538 -0.88614959 0 NA
#> 2  1    2  0.1836433 -1.92225490 0 NA
#> 3  1    3 -0.8356286  1.61970074 1  0
#> 4  1    4  1.5952808  0.51926990 1  0
#> 5  1    5  0.3295078 -0.05584993 1  0
#> 6  1    6 -0.8204684  0.69641761 1  0

Next, we estimate the true parameters in the above DGP using various models. In particular, we examine whether we can recover the true value of x’s coefficient in the second stage.

3.1. PoissonRE

m1 = PoissonRE(y~x, data=sim[!is.na(sim$y), ], id.name='id', verbose=-1)
round(m1$estimates, digits=3)
#>             estimate    se      z p    lci    uci
#> (Intercept)   -0.498 0.091 -5.496 0 -0.675 -0.320
#> x              0.887 0.024 36.800 0  0.840  0.934
#> sigma          1.125 0.066 17.065 0  1.003  1.262

The estimate of x is biased because the above model fails to consider the individual-time level fixed effects and the sample selection issue in the true DGP.

3.2. PLN_RE

m2 = PLN_RE(y~x, data=sim[!is.na(sim$y), ], id.name='id', verbose=-1)
round(m2$estimates, digits=3)
#>             estimate    se      z p    lci    uci
#> (Intercept)   -0.921 0.100 -9.204 0 -1.117 -0.725
#> x              0.932 0.052 17.964 0  0.830  1.034
#> sigma          1.056 0.078 13.519 0  0.914  1.221
#> gamma          0.951 0.048 19.721 0  0.861  1.050

The estimate of x is still biased because the above model fails to consider the sample selection issue in the true DGP.

3.3. ProbitRE

m3 = ProbitRE(z~x+w, data=sim, id.name='id', verbose=-1)
round(m3$estimates, digits=3)
#>             estimate    se      z p   lci   uci
#> (Intercept)    0.985 0.086 11.401 0 0.816 1.154
#> x              0.991 0.058 17.013 0 0.877 1.105
#> w              1.041 0.060 17.220 0 0.923 1.160
#> delta          0.937 0.084 11.130 0 0.786 1.117

The specification of this model is consistent with the DGP of the first stage. Therefore, it can produce consistent estimates of the parameters in the first stage.

3.4. ProbitRE_PoissonRE

m4 = ProbitRE_PoissonRE(z~x+w, y~x, data=sim, id.name='id', verbose=-1)
round(m4$estimates, digits=3)
#>             estimate    se       z     p    lci    uci
#> (Intercept)    0.962 0.090  10.683 0.000  0.786  1.139
#> x              0.994 0.060  16.683 0.000  0.877  1.110
#> w              1.042 0.062  16.896 0.000  0.921  1.163
#> (Intercept)   -0.542 0.041 -13.185 0.000 -0.623 -0.462
#> x              0.884 0.010  92.126 0.000  0.865  0.903
#> delta          0.923 0.080  11.511 0.000  0.778  1.094
#> sigma          0.741 0.015  48.099 0.000  0.711  0.771
#> rho            0.213 0.071   2.993 0.003  0.070  0.347

The results above the second “(Intercept)” are for the first stage. After accounting for self-selection at the individual level, the estimate of x in the second stage is still biased because the true DGP also includes self-selection at the individual-time level.

3.5. ProbitRE_PLNRE

# The estimation may take up to 1 minute
m5 = ProbitRE_PLNRE(z~x+w, y~x, data=sim, id.name='id', verbose=-1)
round(m5$estimates, digits=3)
#>             estimate    se      z p    lci    uci
#> (Intercept)    1.013 0.092 10.986 0  0.832  1.193
#> x              0.995 0.058 17.111 0  0.881  1.109
#> w              1.049 0.061 17.240 0  0.930  1.169
#> (Intercept)   -1.021 0.116 -8.776 0 -1.249 -0.793
#> x              1.028 0.043 23.678 0  0.943  1.113
#> delta          0.947 0.083 11.458 0  0.798  1.123
#> sigma          1.170 0.055 21.271 0  1.067  1.283
#> gamma          0.995 0.037 26.556 0  0.924  1.071
#> rho            0.338 0.094  3.609 0  0.144  0.507
#> tau            0.598 0.139  4.318 0  0.261  0.805

The results above the second “(Intercept)” are for the first stage. The specification of this model is consistent with the true DGP and hence the estimate of x is very close to its true value 1.

The estimation of ProbitRE_PoissonRE and ProbitRE_PLNRE does not require a variable like w that exclusively influences the first-stage outcome, but the identification is stronger with such a variable.