import numpy as np
from pprint import pprint
from crabbymetrics import FixedEffectsOLS, OLS
np.set_printoptions(precision=4, suppress=True)Fixed Effects OLS Example
This page shows how to estimate slope coefficients after partialling out one-way or multi-way categorical fixed effects with within.
1 Simulate A Two-Way Fixed Effects Problem
rng = np.random.default_rng(11)
n = 800
beta_true = np.array([1.2, -0.9])
worker = rng.integers(0, 40, size=n, dtype=np.uint32)
firm = rng.integers(0, 18, size=n, dtype=np.uint32)
fe = np.column_stack([worker, firm]).astype(np.uint32)
x = rng.normal(size=(n, 2))
worker_effect = rng.normal(scale=0.8, size=40)
firm_effect = rng.normal(scale=0.5, size=18)
y = (
x @ beta_true
+ worker_effect[worker]
+ firm_effect[firm]
+ rng.normal(scale=0.1, size=n)
)
model = FixedEffectsOLS()
model.fit(x, fe, y)
print("true coef:", beta_true)
pprint(model.summary())true coef: [ 1.2 -0.9]
{'coef': array([ 1.2034, -0.9 ]),
'coef_se': array([0.0035, 0.0033]),
'vcov_type': 'hc1'}
2 Compare To Explicit Dummy OLS
FixedEffectsOLS is doing the Frisch-Waugh-Lovell version of the same regression. A useful check is to compare it against an OLS fit with worker and firm dummies added directly to the design matrix.
def dummy_block(levels):
n_levels = int(levels.max()) + 1
block = np.zeros((levels.shape[0], max(n_levels - 1, 0)))
for level in range(1, n_levels):
block[:, level - 1] = (levels == level).astype(float)
return block
dummy_design = np.column_stack([x, dummy_block(worker), dummy_block(firm)])
baseline = OLS()
baseline.fit(dummy_design, y)
baseline_summary = baseline.summary()
print("FixedEffectsOLS slopes:", model.summary()["coef"])
print("Dummy OLS slopes: ", baseline_summary["coef"][: x.shape[1]])
print("Absolute difference: ", np.abs(model.summary()["coef"] - baseline_summary["coef"][: x.shape[1]]))FixedEffectsOLS slopes: [ 1.2034 -0.9 ]
Dummy OLS slopes: [ 1.2034 -0.9 ]
Absolute difference: [0. 0.]
3 Robust Covariance Options
Once the fixed effects are partialled out, summary() exposes the same covariance choices as the other linear estimators.
worker_cluster = worker.astype(np.int64)
hc1 = model.summary()
cluster = model.summary(vcov="cluster", clusters=worker_cluster)
hac = model.summary(vcov="newey_west", lags=3)
print("HC1 SE:", np.round(hc1["coef_se"], 4))
print("cluster SE:", np.round(cluster["coef_se"], 4))
print("Newey-West SE:", np.round(hac["coef_se"], 4))HC1 SE: [0.0035 0.0033]
cluster SE: [0.0034 0.004 ]
Newey-West SE: [0.0035 0.0035]
4 Notes
femust be a 2Duint32matrix with one column per factor.- Category codes should be zero-based and contiguous within each factor.
- The estimator absorbs the constant, so there is no intercept in the summary and no
predict()method. - Cluster labels passed to
summary(vcov="cluster", clusters=...)should be a separate 1Dint64array with one label per observation.