Multiple estimations can be a chore to set up. Varying
left-hand-sides (LHS), right-hand-sides (RHS) or samples require either
many lines of code, or loops with formula/data manipulation. The package
fixest
simplifies multiple estimations by providing an
optimized procedure along with a clear and concise syntax.
On the one hand, stepwise functions facilitate the sequential inclusion of variables in the RHS or in the fixed-effects part of the formula. On the other hand, intuitive methods are introduced to manipulate the results from multiple estimations, making it easy to visualize or export any wanted set of results.
What does a multiple estimation look like? Let’s give it a try:
base = iris
names(base) = c("y1", "y2", "x1", "x2", "species")
res_multi = feols(c(y1, y2) ~ x1 + csw(x2, x2^2) | sw0(species), base, fsplit = ~species)
With the previous line of code (90 characters long), we have just performed 32 estimations: eight different functional forms on four different samples.
The previous code leads to the following results:
## sample fixef lhs rhs (Intercept) x1
## 1 Full sample 1 y1 x1 + x2 4.19*** (0.104) 0.542*** (0.076)
## 2 Full sample 1 y1 x1 + x2 + I(x2^2) 4.27*** (0.101) 0.719*** (0.082)
## 3 Full sample 1 y2 x1 + x2 3.59*** (0.103) -0.257*** (0.066)
## 4 Full sample 1 y2 x1 + x2 + I(x2^2) 3.68*** (0.097) -0.030 (0.078)
## 5 Full sample species y1 x1 + x2 0.906*** (0.076)
## 6 Full sample species y1 x1 + x2 + I(x2^2) 0.900*** (0.077)
## 7 Full sample species y2 x1 + x2 0.155* (0.073)
## 8 Full sample species y2 x1 + x2 + I(x2^2) 0.148. (0.075)
## 9 setosa 1 y1 x1 + x2 4.25*** (0.474) 0.399 (0.325)
## 10 setosa 1 y1 x1 + x2 + I(x2^2) 4.00*** (0.504) 0.405 (0.325)
## 11 setosa 1 y2 x1 + x2 2.89*** (0.416) 0.247 (0.305)
## 12 setosa 1 y2 x1 + x2 + I(x2^2) 2.82*** (0.423) 0.248 (0.304)
## 13 setosa species y1 x1 + x2 0.399 (0.325)
## 14 setosa species y1 x1 + x2 + I(x2^2) 0.405 (0.325)
## 15 setosa species y2 x1 + x2 0.247 (0.305)
## 16 setosa species y2 x1 + x2 + I(x2^2) 0.248 (0.304)
## 17 versicolor 1 y1 x1 + x2 2.38*** (0.423) 0.934*** (0.166)
## 18 versicolor 1 y1 x1 + x2 + I(x2^2) 0.323 (1.44) 0.901*** (0.164)
## 19 versicolor 1 y2 x1 + x2 1.25*** (0.275) 0.067 (0.095)
## 20 versicolor 1 y2 x1 + x2 + I(x2^2) 0.097 (1.01) 0.048 (0.099)
## 21 versicolor species y1 x1 + x2 0.934*** (0.166)
## 22 versicolor species y1 x1 + x2 + I(x2^2) 0.901*** (0.164)
## 23 versicolor species y2 x1 + x2 0.067 (0.095)
## 24 versicolor species y2 x1 + x2 + I(x2^2) 0.048 (0.099)
## 25 virginica 1 y1 x1 + x2 1.05. (0.539) 0.995*** (0.090)
## 26 virginica 1 y1 x1 + x2 + I(x2^2) -2.39 (2.04) 0.994*** (0.088)
## 27 virginica 1 y2 x1 + x2 1.06. (0.572) 0.149 (0.107)
## 28 virginica 1 y2 x1 + x2 + I(x2^2) 1.10 (1.76) 0.149 (0.108)
## 29 virginica species y1 x1 + x2 0.995*** (0.090)
## 30 virginica species y1 x1 + x2 + I(x2^2) 0.994*** (0.088)
## 31 virginica species y2 x1 + x2 0.149 (0.107)
## 32 virginica species y2 x1 + x2 + I(x2^2) 0.149 (0.108)
## x2 I(x2^2)
## 1 -0.320. (0.170)
## 2 -1.52*** (0.307) 0.348*** (0.075)
## 3 0.364* (0.142)
## 4 -1.18*** (0.313) 0.446*** (0.074)
## 5 -0.006 (0.163)
## 6 0.290 (0.408) -0.088 (0.117)
## 7 0.623*** (0.114)
## 8 0.951* (0.472) -0.097 (0.125)
## 9 0.712. (0.418)
## 10 2.51. (1.47) -2.91 (2.10)
## 11 0.702 (0.560)
## 12 1.27 (2.39) -0.911 (3.28)
## 13 0.712. (0.418)
## 14 2.51. (1.47) -2.91 (2.10)
## 15 0.702 (0.560)
## 16 1.27 (2.39) -0.911 (3.28)
## 17 -0.320 (0.364)
## 18 3.01 (2.31) -1.24 (0.841)
## 19 0.929*** (0.244)
## 20 2.80. (1.65) -0.695 (0.583)
## 21 -0.320 (0.364)
## 22 3.01 (2.31) -1.24 (0.841)
## 23 0.929*** (0.244)
## 24 2.80. (1.65) -0.695 (0.583)
## 25 0.007 (0.205)
## 26 3.50. (2.09) -0.870 (0.519)
## 27 0.535*** (0.122)
## 28 0.503 (1.56) 0.008 (0.388)
## 29 0.007 (0.205)
## 30 3.50. (2.09) -0.870 (0.519)
## 31 0.535*** (0.122)
## 32 0.503 (1.56) 0.008 (0.388)
This vignette now details how to perform multiple estimations for multiple: LHSs, RHSs, fixed-effects, or samples. It then comes to describe the various methods to access the results.
To perform an estimation on multiple LHS, simply wrap the different
LHS in c()
:
## feols(c(y1, y2)..1 feols(c(y1, y2) ..2
## Dependent Var.: y1 y2
##
## Constant 4.191*** (0.0970) 3.587*** (0.0937)
## x1 0.5418*** (0.0693) -0.2571*** (0.0669)
## x2 -0.3196* (0.1605) 0.3640* (0.1550)
## _______________ __________________ ___________________
## S.E. type IID IID
## Observations 150 150
## R2 0.76626 0.21310
## Adj. R2 0.76308 0.20240
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To estimate multiple RHS (or fixed-effects), you need to use a
specific set of functions: the stepwise functions. There are
five of them: sw
, sw0
, csw
,
csw0
and mvsw
.
sw
: this function is replaced sequentially by each
of its arguments. For example, y ~ x1 + sw(x2, x3)
leads to
two estimations: y ~ x1 + x2
and
y ~ x1 + x3
.
sw0
: identical to sw
but first adds the
empty element. E.g. y ~ x1 + sw0(x2, x3)
leads to three
estimations: y ~ x1
, y ~ x1 + x2
and
y ~ x1 + x3
.
csw
: it stands for cumulative stepwise. It
adds to the formula each of its arguments sequentially. E.g.
y ~ x1 + csw(x2, x3)
will become y ~ x1 + x2
and y ~ x1 + x2 + x3
.
csw0
: identical to csw
but first adds
the empty element. E.g. y ~ x1 + csw0(x2, x3)
leads to
three estimations: y ~ x1
, y ~ x1 + x2
and
y ~ x1 + x2 + x3
.
mvsw
: it stands for multiverse stepwise. It will
add, in a stepwise fashion, all possible combinations of the variables
in its arguments. For example mvsw(x1, x2, x3)
is
equivalent to
sw0(x1, x2, x3, x1 + x2, x1 + x3, x2 + x3, x1 + x2 + x3)
.
The number of models to estimate grows at a factorial rate: so be
very cautious!
The stepwise functions can be applied both to the linear part and the fixed-effects part of the formula. Note that at most one stepwise function can be applied per part.
Here is an example:
## feols(y1 ~ csw..1 feols(y1 ~ cs..2 feols(y1 ~ csw..3
## Dependent Var.: y1 y1 y1
##
## Constant 4.307** (0.1917) 4.191** (0.2443)
## x1 0.4089** (0.0401) 0.5418* (0.1115) 0.9046** (0.0758)
## x2 -0.3196 (0.1707)
## Fixed-Effects: ----------------- ---------------- -----------------
## species No No Yes
## _______________ _________________ ________________ _________________
## S.E.: Clustered by: species by: species by: species
## Observations 150 150 150
## R2 0.75995 0.76626 0.83672
## Within R2 -- -- 0.57178
##
## feols(y1 ~ csw..4
## Dependent Var.: y1
##
## Constant
## x1 0.9059** (0.0814)
## x2 -0.0060 (0.1260)
## Fixed-Effects: -----------------
## species Yes
## _______________ _________________
## S.E.: Clustered by: species
## Observations 150
## R2 0.83673
## Within R2 0.57179
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As you can see, if the stepwise functions are in both parts, there will be as many estimations as the cardinal product of the two parts.
To perform split sample estimations, use either the argument
split
or fsplit
. The argument
split
accepts a variable that will be treated as a factor,
and an estimation will be performed for each sub-sample defined by each
level of this variable. The argument fsplit
is identical
but first adds the full sample.
## feols(y1 ~ x1 +..1 feols(y1 ~ x1 ..2 feols(y1 ~ x1 +..3
## Sample (species) Full sample setosa versicolor
## Dependent Var.: y1 y1 y1
##
## Constant 4.191*** (0.0970) 4.248*** (0.4114) 2.381*** (0.4493)
## x1 0.5418*** (0.0693) 0.3990 (0.2958) 0.9342*** (0.1693)
## x2 -0.3196* (0.1605) 0.7121 (0.4874) -0.3200 (0.4024)
## ________________ __________________ _________________ __________________
## S.E. type IID IID IID
## Observations 150 50 50
## R2 0.76626 0.11173 0.57432
## Adj. R2 0.76308 0.07393 0.55620
##
## feols(y1 ~ x1 +..4
## Sample (species) virginica
## Dependent Var.: y1
##
## Constant 1.052* (0.5139)
## x1 0.9946*** (0.0893)
## x2 0.0071 (0.1795)
## ________________ __________________
## S.E. type IID
## Observations 50
## R2 0.74689
## Adj. R2 0.73612
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can combine multiple LHS to multiple RHS to multiple fixed-effects to multiple samples. The total number of estimations is always equal to the cardinal product of the total number of parts.
We’ve just seen how to perform multiple estimations, now let’s see
how to manipulate them. First a multiple estimation is a
fixest_multi
object with its own set of methods. We can
access its elements by using keys. There are five keys:
sample
, lhs
, rhs
,
fixef
, and iv
.
res_multi[sample = 1]
returns all the results for the
first sample. res_multi[lhs = .N]
returns all the results
for the last dependent variable (the special variable .N
can be used to refer to the last element). etc.
You can combine different keys:
res_multi[sample = -1, lhs = 1]
will select all results for
all samples but the first, and for the first dependent variable.
Note that these arguments accept regular expressions, so
res_multi[fixef = "spe"]
returns all results for which the
character string "spe"
is contained in the fixed-effects
part of the formula.
The results in a fixest_multi
object have a specific
order, organized in a tree. By default the order is \(sample \rightarrow lhs \rightarrow fixef
\rightarrow rhs \rightarrow iv\).
Changing the order of the results is important to organize/export
them. By default, when one accesses fixest_multi
objects
the results are reordered according to the order of the arguments
used.
For instance, res_mutli[rhs = 1:.N, fixef = 1:.N]
will
place the RHS at the root of the tree followed by the fixed-effects.
Then the sample and the LHS will follow.
The arguments accept logical values:
res_multi[fixef = TRUE, sample = FALSE]
will keep
all results but will place fixef
as the root and
sample
as the last leaf.
This subsetting can then be used to easily obtain the appropriate set of results and ordering:
## res_multi[lhs ..1 res_multi[lhs =..2 res_multi[lhs =..3
## Sample (species) setosa versicolor virginica
## Dependent Var.: y1 y1 y1
##
## Constant 4.248*** (0.4114) 2.381*** (0.4493) 1.052* (0.5139)
## x1 0.3990 (0.2958) 0.9342*** (0.1693) 0.9946*** (0.0893)
## x2 0.7121 (0.4874) -0.3200 (0.4024) 0.0071 (0.1795)
## x2 square
## ________________ _________________ __________________ __________________
## S.E. type IID IID IID
## Observations 50 50 50
## R2 0.11173 0.57432 0.74689
## Adj. R2 0.07393 0.55620 0.73612
##
## res_multi[lhs ..4 res_multi[lhs =..5 res_multi[lhs =..6
## Sample (species) setosa versicolor virginica
## Dependent Var.: y1 y1 y1
##
## Constant 4.004*** (0.4892) 0.3234 (1.791) -2.393 (2.173)
## x1 0.4048 (0.2963) 0.9006*** (0.1710) 0.9939*** (0.0878)
## x2 2.511 (2.011) 3.015 (2.839) 3.504 (2.153)
## x2 square -2.913 (3.157) -1.236 (1.042) -0.8703 (0.5340)
## ________________ _________________ __________________ __________________
## S.E. type IID IID IID
## Observations 50 50 50
## R2 0.12786 0.58696 0.76071
## Adj. R2 0.07098 0.56002 0.74510
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Defining the standard-errors at estimation time, by using the
arguments vcov
, can be useful to obtain a coherent set of
standard-errors across results, especially if the fixed-effects are
modified (which will modify the default clustering of standard-errors
across models).
IV estimations return a regular fixest
object. The
summary
applied to it however can return a
fixest_multi
object. This is the case when both the first
and second stage regressions are requested using the argument
stage = 1:2
. You can then cherry-pick the results as before
using, e.g. res[iv = 1]
. Note, importantly, that the index
refers to the order of the results and 1 here does not mean the first
stage.
The objects returned by fixest
estimations are large.
They contain the necessary information to apply various methods without
incurring additional computing costs. This is particularly true for
clustering the standard-errors for instance. Stated differently speed is
privileged over memory usage.
The problem when it comes to multiple estimations is that it is very easy to perform many many estimations leading to a ballooning of object size possibly getting out of control at some point. To circumvent this issue, here’s what to do:
vcov
to get a summary of the results
with the appropriate standard-errors at estimation time,lean = TRUE
.This will perform the estimation with the appropriate standard errors (point 1) and clean any large object from the results (point 2).
The drawback of this is that you won’t be able to apply some methods
to the results (like changing the type of standard-errors,
predict
, resid
, etc). But the amount of memory
saved can be considerable.