The package fixest
provides a family of functions to
perform estimations with multiple fixed-effects. The two main functions
are feols
for linear models and feglm
for
generalized linear models. In addition, the function femlm
performs direct maximum likelihood estimation, and feNmlm
extends the latter to allow the inclusion of non-linear in parameters
right-hand-sides. Finally, the functions fepois
and
fenegbin
are aliases for Poisson and negative binomial
fixed-effect estimations. Each of these functions supports any number of
fixed-effects and is implemented with full fledged multi-threading in
C++. Functions feols
and feglm
further support
variables with varying slopes.
This package is currently (Feb. 2020) the fastest software available to perform fixed-effects estimations. See the project’s homepage for a set of benchmarks.
The standard-errors of the estimates can be easily and intuitively clustered (up to four-way).
The function etable
allows to seamlessly export the
results of multiple estimations into either a data.frame, or into a
Latex table.
The main features of the package are illustrated in this vignette. The theory used to obtain the fixed-effects is based on Berge (2018), “Efficient estimation of maximum likelihood models with multiple fixed-effects: the R package FENmlm.” CREA Discussion Papers, 13.
This example deals with international trade, which is a setup that usually requires performing estimations with many fixed-effects. We estimate a very simple gravity model in which we are interested in finding out the negative effect of geographic distance on trade. The sample data consists of European trade extracted from Eurostat. Let’s load the data contained in the package:
This data is a sample of bilateral importations between EU15 countries from 2007 and 2016. The data is further broken down according to 20 product categories. Here is a sample of the data:
Destination | Origin | Product | Year | dist_km | Euros |
---|---|---|---|---|---|
LU | BE | 1 | 2007 | 139.5719 | 2966697 |
BE | LU | 1 | 2007 | 139.5719 | 6755030 |
LU | BE | 2 | 2007 | 139.5719 | 57078782 |
BE | LU | 2 | 2007 | 139.5719 | 7117406 |
LU | BE | 3 | 2007 | 139.5719 | 17379821 |
BE | LU | 3 | 2007 | 139.5719 | 2622254 |
The dependent variable of the estimation will be the level of trade between two countries while the independent variable is the geographic distance between the two countries. To obtain the elasticity of geographic distance net of the effects of the four fixed-effects, we estimate the following:
\(E\left(Trade_{i,j,p,t}\right)=\gamma_{i}^{Exporter}\times\gamma_{j}^{Importer}\times\gamma_{p}^{Product}\times\gamma_{t}^{Year}\times Distance_{ij}^{\beta}\),
where the subscripts \(i\), \(j\), \(p\) and \(t\) stand respectively for the exporting country, the importing country, the type of product and the year, and the \(\gamma_{v}^{c}\) are fixed-effects for these groups. Here \(\beta\) is the elasticity of interest.
Note that when you use the Poisson/Negative Binomial families, this relationship is in fact linear because the right hand side is exponentialized to avoid negative values for the Poisson parameter. This leads to the equivalent relation:1
\(E\left(Trade_{i,j,p,t}\right)=\exp\left(\gamma_{i}^{Exporter}+\gamma_{j}^{Importer}+\gamma_{p}^{Product}+\gamma_{t}^{Year}+\beta\times \ln Distance_{ij}\right)\).
The estimation of this model using a Poisson likelihood is as follows:
The function fepois
is actually an alias to the function
feglm
with family = poisson
. The results can
be shown directly with the print
method:
print(gravity_pois)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15, Destination: 15, Product: 20, Year: 10
#> Standard-errors: Clustered (Origin)
#> Estimate Std. Error z value Pr(>|z|)
#> log(dist_km) -1.52787 0.115678 -13.208 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -7.025e+11 Adj. Pseudo R2: 0.764032
#> BIC: 1.405e+12 Squared Cor.: 0.612021
The print
reports the coefficient estimates and
standard-errors as well as some other information. Among the quality of
fit information, the squared-correlation corresponds to the correlation
between the dependent variable and the expected predictor; it reflects
somehow the idea of R-square in OLS estimations. Note that the
estimation is performed using parallel computing which you can control
using the argument nthreads
(see the “multi-threading”
section for more details).
To cluster the standard-errors, we can simply use the argument
vcov
of the summary
method. Let’s say we want
to cluster the standard-errors according to the first two fixed-effects
(i.e. the Origin and Destination variables). Then we
just have to do:
summary(gravity_pois, vcov = "twoway")
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15, Destination: 15, Product: 20, Year: 10
#> Standard-errors: Clustered (Origin & Destination)
#> Estimate Std. Error z value Pr(>|z|)
#> log(dist_km) -1.52787 0.130734 -11.6869 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -7.025e+11 Adj. Pseudo R2: 0.764032
#> BIC: 1.405e+12 Squared Cor.: 0.612021
The clustering can be done on one, two, three or up to four
variables. If the estimation includes fixed-effects, then by default the
clustering will be done using these fixed-effects, in the original
order. This is why the Origin and Destination
variables were used for the two-way clustering in the previous example.
If, instead, you wanted to perform one-way clustering on the
Product variable, you need to provide it in a formula or use
the argument cluster
:
# Three ways to summon clustering on the Product variable
summary(gravity_pois, vcov = ~Product)
summary(gravity_pois, cluster = "Product")
summary(gravity_pois, cluster = ~Product)
Both produce the same results:
summary(gravity_pois, cluster = ~Product)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15, Destination: 15, Product: 20, Year: 10
#> Standard-errors: Clustered (Product)
#> Estimate Std. Error z value Pr(>|z|)
#> log(dist_km) -1.52787 0.098294 -15.544 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -7.025e+11 Adj. Pseudo R2: 0.764032
#> BIC: 1.405e+12 Squared Cor.: 0.612021
Note that you can always cluster the standard-errors, even when the estimation contained no fixed-effect:
gravity_simple = fepois(Euros ~ log(dist_km), trade)
# We use a formula to specify the variables used for two way clustering
# (note that the values of the variables are fetched directly in the original database)
summary(gravity_simple, ~Origin + Destination)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Standard-errors: Clustered (Origin & Destination)
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 24.70889 1.124768 21.96798 < 2.2e-16 ***
#> log(dist_km) -1.02896 0.158022 -6.51145 7.4429e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -2.426e+12 Adj. Pseudo R2: 0.185023
#> BIC: 4.852e+12 Squared Cor.: 0.055107
Finally, the standard-errors can also be computed at estimation time,
you simply need to add the vcov
argument:
fepois(Euros ~ log(dist_km), trade, vcov = ~Product)
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Standard-errors: Clustered (Product)
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 24.70889 0.330044 74.8654 < 2.2e-16 ***
#> log(dist_km) -1.02896 0.045954 -22.3909 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -2.426e+12 Adj. Pseudo R2: 0.185023
#> BIC: 4.852e+12 Squared Cor.: 0.055107
Talking about standard-errors… there are more than clustered standard-errors that can be computed… and there are many ways to achieve the same thing… and many shortcuts to know. So before you leave don’t forget to have a look at the section describing how to use the vcov argument!
Now we estimate the same relationship by OLS. We need to put the left hand side in logarithm (since the right-hand-side is not exponentialized):
Of course you can use different families in feglm
,
exactly as in glm
.
To get the estimation for the fixed-effects Negative Binomial:
Now let’s say that we want a compact overview of the results of
several estimations. The best way is to use the function
etable
. This function summarizes the results of several
fixest
estimations into a data.frame. To see the
fixed-effects results with the three different likelihoods, we just have
to type:
etable(gravity_pois, gravity_negbin, gravity_ols,
vcov = "twoway", headers = c("Poisson", "Negative Binomial", "Gaussian"))
gravity_pois | gravity_negbin | gravity_ols | ||
---|---|---|---|---|
1 | Poisson | Negative Binomial | Gaussian | |
3 | ||||
4 | log(dist_km) | -1.528*** (0.1307) | -1.711*** (0.1773) | -2.170*** (0.1714) |
5 | Fixed-Effects: | —————— | —————— | —————— |
6 | Origin | Yes | Yes | Yes |
7 | Destination | Yes | Yes | Yes |
8 | Product | Yes | Yes | Yes |
9 | Year | Yes | Yes | Yes |
10 | _______________ | __________________ | __________________ | __________________ |
11 | Family | Poisson | Neg. Bin. | OLS |
12 | S.E.: Clustered | by: Orig. & Dest. | by: Orig. & Dest. | by: Orig. & Dest. |
13 | Observations | 38,325 | 38,325 | 38,325 |
14 | Squared Cor. | 0.61202 | 0.43760 | 0.70558 |
15 | Pseudo R2 | 0.76403 | 0.03473 | 0.23640 |
16 | BIC | 1.4e+12 | 1,293,786.1 | 151,977.2 |
17 | Over-dispersion | – | 0.54877 | – |
We added the argument vcov="twoway"
to cluster the
standard-errors for all estimations. As can be seen this function gives
an overview of the estimates and standard-errors, as well as some
quality of fit measures. The argument headers
is used to
add information on each estimation column.
In the previous example, we directly added the estimation results as
arguments of the function etable
. But the function also
accepts lists of estimations. Let’s give an example. Say you want to see
the influence of the introduction of fixed-effects on the estimate of
the elasticity of distance. You can do it with the following code where
we use the argument fixef
to include fixed-effects (instead
of inserting them directly in the formula):
gravity_subfe = list()
all_FEs = c("Year", "Destination", "Origin")
for(i in 0:3){
gravity_subfe[[i+1]] = fepois(Euros ~ log(dist_km), trade, fixef = all_FEs[0:i])
}
The previous code performs 4 estimations with an increasing number of
fixed-effects and store their results into the list named
gravity_subfe
. To show the results of all 4 estimations,
it’s easy:
model 1 | model 2 | model 3 | model 4 | |
---|---|---|---|---|
Dependent Var.: | Euros | Euros | Euros | Euros |
Constant | 24.71*** (1.125) | |||
log(dist_km) | -1.029*** (0.1580) | -1.029*** (0.1581) | -1.226*** (0.2045) | -1.518*** (0.1282) |
Fixed-Effects: | —————— | —————— | —————— | —————— |
Year | No | Yes | Yes | Yes |
Destination | No | No | Yes | Yes |
Origin | No | No | No | Yes |
_______________ | __________________ | __________________ | __________________ | __________________ |
S.E.: Clustered | by: Orig. & Dest. | by: Orig. & Dest. | by: Orig. & Dest. | by: Orig. & Dest. |
Observations | 38,325 | 38,325 | 38,325 | 38,325 |
Squared Cor. | 0.05511 | 0.05711 | 0.16420 | 0.38479 |
Pseudo R2 | 0.18502 | 0.18833 | 0.35826 | 0.59312 |
BIC | 4.85e+12 | 4.83e+12 | 3.82e+12 | 2.42e+12 |
We have a view of the 4 estimations, all reporting two-way clustered
standard-errors thanks to the use of the argument
cluster
.
Note that since version 0.8.0, multiple estimations can be performed
at once without requiring loops. Let’s replicate the previous example
using fixest
stepwise functions:
The previous line of code performs 4 estimations. The function
csw0
is the key here, it means: cumulative stepwise
starting with the empty element. Starting with the empty element,
each new estimation adds a new element in the csw0()
function, quite like the previous loop. Then you can consider the
results, here res_multi
, as a list of results, although
with specific methods to easily access each element.
Stepwise functions can be applied to the linear right-hand-side and
to the fixed-effects, you can also have multiple dependent variables and
perform split sample estimations with the argument split
.
All of this is detailed in the dedicated vignette: Multiple
estimations.
So far we have seen how to report the results of multiple estimations
on the R console. Now, using the same function etable
, we
can also export the results to high quality Latex tables. We just need
to provide the argument tex = TRUE
:
# with two-way clustered SEs
etable(res_multi, cluster = ~Origin+Destination, tex = TRUE)
#> \begingroup
#> \centering
#> \begin{tabular}{lcccc}
#> \tabularnewline \midrule \midrule
#> Dependent Variable: & \multicolumn{4}{c}{Euros}\\
#> Model: & (1) & (2) & (3) & (4)\\
#> \midrule
#> \emph{Variables}\\
#> Constant & 24.71$^{***}$ & & & \\
#> & (1.125) & & & \\
#> log(dist\_km) & -1.029$^{***}$ & -1.029$^{***}$ & -1.226$^{***}$ & -1.518$^{***}$\\
#> & (0.1580) & (0.1581) & (0.2045) & (0.1282)\\
#> \midrule
#> \emph{Fixed-effects}\\
#> Year & & Yes & Yes & Yes\\
#> Destination & & & Yes & Yes\\
#> Origin & & & & Yes\\
#> \midrule
#> \emph{Fit statistics}\\
#> Observations & 38,325 & 38,325 & 38,325 & 38,325\\
#> Squared Correlation & 0.05511 & 0.05711 & 0.16420 & 0.38479\\
#> Pseudo R$^2$ & 0.18502 & 0.18833 & 0.35826 & 0.59312\\
#> BIC & $4.85\times 10^{12}$ & $4.83\times 10^{12}$ & $3.82\times 10^{12}$ & $2.42\times 10^{12}$\\
#> \midrule \midrule
#> \multicolumn{5}{l}{\emph{Clustered (Origin \& Destination) standard-errors in parentheses}}\\
#> \multicolumn{5}{l}{\emph{Signif. Codes: ***: 0.01, **: 0.05, *: 0.1}}\\
#> \end{tabular}
#> \par\endgroup
The user can export the Latex table directly into a file (argument
file
), add a title (arg. title
) and a label to
the table (arg. label
). Note that when the argument
file
is present, the Latex format becomes the default
(i.e. tex = TRUE
by default).
The coefficients can be renamed easily (arg. dict
), some
can be dropped (arg. drop
) and they can be easily reordered
with regular expressions (arg. order
).
The significance codes can easily be changed (arg.
signifCode
) and all quality of fit information can be
customized (argument fitstat
). Among others, the number of
fixed-effect per fixed-effect dimension can also be displayed using the
argument fixef_sizes
.
Consider the following example of the exportation of two tables:
# we set the dictionary once and for all
myDict = c("log(dist_km)" = "$\\ln (Distance)$", "(Intercept)" = "Constant")
# 1st export: we change the signif code and drop the intercept
etable(res_multi, signifCode = c("a" = 0.01, "b" = 0.05),
drop = "Const", dict = myDict, file = "Estimation Tables.tex",
replace = TRUE, title = "First export -- normal Standard-errors")
# 2nd export: clustered S-E + distance as the first coefficient
etable(res_multi, cluster = ~Product, order = "Dist",
dict = myDict, file = "Estimation Tables.tex",
title = "Second export -- clustered standard-errors (on Product variable)")
In this example, two tables containing the results of the 4
estimations are directly exported to a Latex table into the file
“Estimation Tables.tex”. First take notice (again) that we do not
need to use the argument tex=TRUE
since when the
argument file
is present, the Latex format becomes the
default. The file is re-created in the first exportation thanks to the
argument replace = TRUE
.
To change the variable names in the Latex table, we use the argument
dict
. The variable myDict
is the dictionary we
use to rename the variables, it is simply a named vector. The original
name of the variables correspond to the names of myDict
while the new names of the variables are the values of this vector. Any
variable that matches the names of myDict
will be replaced
by its value. Thus we do not care of the order of appearance of the
variables in the estimation results.
In the first export, the coefficient of the intercept is dropped by
using drop = "Const"
(could be anything such that
grepl(drop[1], "Constant")
is TRUE). In the second, the
coefficient of the distance is put before the intercept (which is kept)
thanks to the argument order
. Note that the actions
performed by the arguments drop
or order
are
performed after the renaming takes place with the
argument dict
.
Note that you can completely customize the style of the table by
using the style
and postprocessing
arguments,
please have a look at the dedicated vignette: Exporting
estimation tables.
To obtain the fixed-effects of the estimation, the function
fixef
must be performed on the results. This function
returns a list containing the fixed-effects coefficients for each
dimension. The summary
method helps to have a quick
overview:
fixedEffects = fixef(gravity_pois)
summary(fixedEffects)
#> Fixed_effects coefficients
#> Origin Destination Product Year
#> Number of fixed-effects 15 15 20 10
#> Number of references 0 1 1 1
#> Mean 23.3 3.09 0.0129 0.157
#> Standard-deviation 1.28 1.11 1.36 0.113
#>
#> COEFFICIENTS:
#> Origin: AT BE DE DK ES
#> 22.51 23.56 24.71 23.44 24.97 ... 10 remaining
#> -----
#> Destination: AT BE DE DK ES
#> 2.436 2.696 4.323 2.451 4.043 ... 10 remaining
#> -----
#> Product: 1 2 3 4 5
#> 0 1.414 0.6562 1.449 -1.521 ... 15 remaining
#> -----
#> Year: 2007 2008 2009 2010 2011
#> 0 0.06912 0.005225 0.07331 0.163 ... 5 remaining
We can see that the fixed-effects are balanced across the dimensions.
Indeed, apart from the first dimension, only one coefficient per
fixed-effect needs to be set as reference (i.e. fixed to 0) to avoid
collinearity across the different fixed-effects dimensions. This ensures
that the fixed-effects coefficients can be compared within a given
fixed-effect dimension. Had there be strictly more than one reference
per fixed-effect dimension, their interpretation would have not been
possible at all. If this was the case though, a warning message would
have been prompted. Note that the mean values are meaningless per se,
but give a reference points to which compare the fixed-effects within a
dimension. Let’s look specifically at the Year
fixed-effects:
fixedEffects$Year
#> 2007 2008 2009 2010 2011 2012
#> 0.000000000 0.069122284 0.005225473 0.073308208 0.163013386 0.192605170
#> 2013 2014 2015 2016
#> 0.230629376 0.242605404 0.282800683 0.310325692
Finally, the plot
method helps to distinguish the most
notable fixed-effects:
For each dimension, the fixed-effects are first centered, then sorted, and finally the most notable (i.e. highest and lowest) are reported. The exponential of the coefficient is reported in the right hand side to simplify the interpretation for models with log-link (as the Poisson model). As we can see from the country of destination fixed-effects, trade involving France (FR), Italy (IT) and Germany (DE) as destination countries is more than 2.7 times higher than the EU15 average. Further, the highest heterogeneity come from the product category, where trade in product 4 (dairy products) is roughly 2.7 times the average while product 14 (vegetable plaiting materials) represents a negligible fraction of the average.
Note however that the interpretation of the fixed-effects must be taken with extra care. In particular, here the fixed-effects can be interpreted only because they are perfectly balanced.
The version 0.10.0 of fixest
introduced the argument
vcov
. This argument is highly versatile and single-handedly
manages how the standard-errors are computed (note that you can find
many example in summary.fixest
help pages). This argument can be many things, so bear with me, and now,
let’s start trying to describe it!
So far (version 0.10.0), six types of standard-errors can be computed, here are their keywords and a brief description:
iid
: assumes that the error variance is spherical,
i.e. errors are homoskedastic and not correlated (independent and
identically distributed errors have a spherical error
variance).
hetero
: assumes that errors are heteroskedastic
(White correction).
cluster
, twoway
: clustered SEs, assumes
that errors are correlated within the cluster groups.
NW
, newey_west
: Newey and West (1987)
SEs for time series or panel data, assumes heteroskedastic and serially
correlated errors.
DK
, driscoll_kraay
: Driscoll and Kraay
(1998) SEs for panel data, assumes cross-sectionally and serially
correlated errors.
conley
: Conley (1999) SEs for cross-sections,
assumes spatially correlated errors.
The most basic uses of the vcov
argument are:
with a character scalar: vcov = "vcov_type"
(ex:
vcov = "hetero"
).
with a formula of the form:
vcov = vcov_type ~ variables
where the variables are used
to compute the SEs (ex: vcov = DK ~ period
).
Most of the VCOV types need the information on some variables to be computed. For these types, one then needs to use a formula to inform on these variables. Let’s give a first example:
data(base_did)
est = feols(y ~ x1, base_did)
# Note that there is partial matching enabled (newey = newey_west)
summary(est, newey ~ id + period)
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Standard-errors: Newey-West (L=1)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.988753 0.174111 11.4223 1.1709e-06 ***
#> x1 0.983110 0.052699 18.6551 1.6762e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.89686 Adj. R2: 0.262357
Here we queried Newey-West SEs and provided the panel identifiers in the right of the formula. Had we used the keyword only, that would have led to an error:
summary(est, "newey_west")
#> Error in vcov.fixest(object, vcov = vcov, ssc = ssc, forceCovariance = forceCovariance, : To compute the Newey-West VCOV, we need a variable for the time. Since you didn't provide it in the formula, we typically deduce it from the 'panel.id' identifiers. PROBLEM: no 'panel.id' was set in this estimation. Please provide it in the formula.
But, huh what does the error message says? That the time
variable can be deduced? Indeed, that’s the thing: the algorithm tries
hard to provide sensible default values for the user. Let’s now try with
panel identifiers set at estimation time:
est_panel = feols(y ~ x1, base_did, panel.id = ~id + period)
summary(est_panel, "newey_west")
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Standard-errors: Newey-West (L=1)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.988753 0.174111 11.4223 1.1709e-06 ***
#> x1 0.983110 0.052699 18.6551 1.6762e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.89686 Adj. R2: 0.262357
Oh, now it works! However I can see some shade of disappointment in
your eyes: since we need to add panel.id = ~id + period
we
end up with more typing! Well, I’m not done: typically we tend to make
many estimations with the same data set, so it may be useful to set some
characteristics globally. This can be done with the
setFixest_estimation()
function:
setFixest_estimation(panel.id = ~id + period)
est_implicit = feols(y ~ x1, base_did)
summary(est_implicit, "newey_west")
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Standard-errors: Newey-West (L=1)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.988753 0.174111 11.4223 1.1709e-06 ***
#> x1 0.983110 0.052699 18.6551 1.6762e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.89686 Adj. R2: 0.262357
Although the exact same two lines of code led to an error a few paragraphs ago, now it works fine.
Here is another example of implicit deduction:
summary(est_implicit, "cluster")
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Standard-errors: Clustered (id)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.988753 0.194352 10.2327 < 2.2e-16 ***
#> x1 0.983110 0.046789 21.0115 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.89686 Adj. R2: 0.262357
The SEs are clustered at the unit identifier of the panel. Note that even if fixed-effects are present, the panel identifier takes precedence for clustering:
feols(y ~ x1 | period, base_did, "cluster")
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Fixed-effects: period: 10
#> Standard-errors: Clustered (id)
#> Estimate Std. Error t value Pr(>|t|)
#> x1 0.997536 0.045721 21.818 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.5526 Adj. R2: 0.357059
#> Within R2: 0.297883
Of course, if the panel identifier is missing, the automatic clustering falls back to the first fixed-effect present:
# Removing the panel
setFixest_estimation(reset = TRUE)
feols(y ~ x1 | period, base_did, "cluster")
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Fixed-effects: period: 10
#> Standard-errors: Clustered (period)
#> Estimate Std. Error t value Pr(>|t|)
#> x1 0.997536 0.050264 19.8458 9.7207e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.5526 Adj. R2: 0.357059
#> Within R2: 0.297883
Talking about clustered SEs, note that this is such a common
operation that the empty keyword is equivalent to clustered SEs, so that
vcov = ~id + period
would lead to SEs clustered by
id
and period
:
feols(y ~ x1 | period, base_did, ~id + period)
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Fixed-effects: period: 10
#> Standard-errors: Clustered (id & period)
#> Estimate Std. Error t value Pr(>|t|)
#> x1 0.997536 0.047498 21.0015 5.8984e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.5526 Adj. R2: 0.357059
#> Within R2: 0.297883
Finally, a last example using Conley SEs:
data(quakes)
feols(depth ~ mag, quakes, "conley")
#> OLS estimation, Dep. Var.: depth
#> Observations: 1,000
#> Standard-errors: Conley (90km)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 881.625 110.6727 7.96606 4.4465e-15 ***
#> mag -123.421 20.1746 -6.11765 1.3619e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 209.6 Adj. R2: 0.052245
In this case the latitude and longitude are deduced from the
variables names in the data set used for the estimation (here these are
"lat"
and "long"
). Note that, for convenience,
an automatic distance cutoff is deduced from the data but is not likely
to be the most appropriate! But how to pass your own cutoff as a
parameter? That’s what we’ll see in the next section!
Some VCOVs require parameters: for example in the Newey-West VCOV, you need to provide the number of lags to consider. Although it is automatically deduced via a rule of thumb, the user is likely to want to use (or test) other values. To provide these extra parameters, you have two solutions:
use helper functions which have the same name as the VCOV
keywords (ex: NW
or newey_west
are two
functions).
use dedicated VCOV functions (ex: vcov_NW
).
We now cover these two methods in turn. Helper functions can be seen as supercharged keywords: they are used in place of the VCOV keywords but also accept arguments. Here are two examples:
feols(y ~ x1 | period, base_did, NW(2) ~ id + period)
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Fixed-effects: period: 10
#> Standard-errors: Newey-West (L=2)
#> Estimate Std. Error t value Pr(>|t|)
#> x1 0.997536 0.049344 20.2159 8.2589e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.5526 Adj. R2: 0.357059
#> Within R2: 0.297883
feols(depth ~ mag, quakes, conley(200, distance = "spherical"))
#> OLS estimation, Dep. Var.: depth
#> Observations: 1,000
#> Standard-errors: Conley (200km)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 881.625 128.2426 6.87467 1.0937e-11 ***
#> mag -123.421 22.8950 -5.39074 8.7582e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 209.6 Adj. R2: 0.052245
In the first example, we use the helper function NW()
directly in the formula, exactly where the VCOV type would be. The first
(and only) argument of NW()
is the number of lags. On the
right hand side of the formula, we give the variables used to identify
the unit and the time variables, as usual.
In the second example, we use no formula but directly the helper
function conley()
for which we provide two arguments: the
cutoff (first argument) and how the distance should be computed. The
latitude and longitude are still deduced from the data. If it could not
be deduced, we would have to provide a formula giving them, like in the
previous example.
There are also dedicated VCOV functions which work in a more standard way. Let’s redo the same two examples with them:
feols(y ~ x1 | period, base_did, vcov_NW("id", "period", lag = 2))
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Fixed-effects: period: 10
#> Standard-errors: Newey-West (L=2)
#> Estimate Std. Error t value Pr(>|t|)
#> x1 0.997536 0.049344 20.2159 8.2589e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.5526 Adj. R2: 0.357059
#> Within R2: 0.297883
feols(depth ~ mag, quakes, vcov_conley(lat = "lat", lon = "long",
cutoff = 200, distance = "spherical"))
#> OLS estimation, Dep. Var.: depth
#> Observations: 1,000
#> Standard-errors: Conley (200km)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 881.625 128.2426 6.87467 1.0937e-11 ***
#> mag -123.421 22.8950 -5.39074 8.7582e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 209.6 Adj. R2: 0.052245
Contrary to the helper functions: a) these functions cannot be used in a formula, b) they accept the variables to compute the SEs as arguments. They provide an alternative way to specify the VCOV.
The value of the SEs can (sometimes quite surprisingly) be impacted
by the type of small sample correction (SSC) applied. In
fixest
you can monitor the SSC with the function
ssc()
. Details on how the SSCs are computed can be found in
the dedicated vignette: On
standard-errors – here we detail only the implementation.
Most fixest
functions accept an ssc
argument that you can readily use. Here’s an example:
est = feols(y ~ x1 | id, base_did)
est_up = feols(y ~ x1 | id, base_did, ssc = ssc(fixef.K = "full"))
est_down = feols(y ~ x1 | id, base_did, ssc = ssc(adj = FALSE, cluster.adj = FALSE))
etable(est, est_up, est_down)
#> est est_up est_down
#> Dependent Var.: y y y
#>
#> x1 0.9615*** (0.0481) 0.9615*** (0.0507) 0.9615*** (0.0478)
#> Fixed-Effects: ------------------ ------------------ ------------------
#> id Yes Yes Yes
#> _______________ __________________ __________________ __________________
#> S.E.: Clustered by: id by: id by: id
#> Observations 1,080 1,080 1,080
#> R2 0.38715 0.38715 0.38715
#> Within R2 0.26507 0.26507 0.26507
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Instead of providing the argument ssc
, you can instead
add the ssc
call directly within the formula of the
vcov
. In this case you should simply add it as a variable.
We now replicate the previous example using a list of several VCOVs in
etable
(that’s the only function that accepts lists in the
argument vcov
):
etable(est, vcov = list(~id, ~id + ssc(fixef.K = "full"),
~id + ssc(adj = FALSE, cluster.adj = FALSE)))
#> est est.1 est.2
#> Dependent Var.: y y y
#>
#> x1 0.9615*** (0.0481) 0.9615*** (0.0507) 0.9615*** (0.0478)
#> Fixed-Effects: ------------------ ------------------ ------------------
#> id Yes Yes Yes
#> _______________ __________________ __________________ __________________
#> S.E.: Clustered by: id by: id by: id
#> Observations 1,080 1,080 1,080
#> R2 0.38715 0.38715 0.38715
#> Within R2 0.26507 0.26507 0.26507
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can also tweak iid
or hetero
SEs using
a formula: