Panel Data (DiD, Synth, MC) Causal Inference

Author

Apoorva Lal

Published

November 28, 2022

1 Setup

Causal inference with panel data can be viewed as a tensor completion problem, where the tensors in consideration are N \times T \times D, where N, T, and D are the number of units, periods, and treatment states respectively. WLOG, we focus on the binary case.

In most observational settings, treatment is rare and absorbing, so the \mathbf{Y}^(0) matrix is dense and \mathbf{Y}^{1} matrix is sparse. So, most researchers focus on the Average Treatment effect on the Treated (ATT), which is the difference {\mathbb{E}\left[Y^{1} - Y^0 | D = 1\right]} in the post-treatment periods for the treated units. Y^1s are observed for the treatment units {i: D = 1}, so the learning problem is to impute untreated potential outcomes Y^0.

Consider a N \times T outcome matrix of untreated potential outcomes \mathbf{Y}^0. Some subset N_1 units are treated from time T0+1 onwards. We can construct a corresponding matrix \mathbf{W}, which serves as a mask for \mathbf{Y}^0: whenever W_{it} is 1, \mathbf{Y}^0 is missing, and vice versa. The goal is to impute the missing entries corresponding with ? in the matrix below.

To impute missing potential outcomes, we may leverage

  • Vertical Regression: Learn the cross-sectional relationship between the outcomes for units in the pre-treatment period, assert that they are stable over time, and impute accordingly
  • Horizontal Regression: Learn the time-series relationship between pre-treatment and post-treatment outcomes for the never-treated units, assert that they are stable across units, and impute accordingly
  • Mixture: Mix between the two approaches.

Vertical Regression is particularly suitable when the data matrix is ‘fat’ (T >> N), while Horizontal Regression is particularly suitable when the data matrix is ‘thin’ (N >> T). MC is well suited for square-ish matrices.

1.1 Implementation

Code
root = "~/Desktop/0_computation_notes/metrics/00_CausalInference/Panel/"
df = fread(file.path(root, "california_prop99.csv"))
# custom function to construct wide outcome and treatment matrices from panel data
setup = panelMatrices(df, "State", "Year", "treated", "PacksPerCapita")
attach(setup)

mask = 1 - W
Yr = as.numeric(colnames(Y))

# california series
Ycal = Y %>% .[nrow(.), ]

1.1.1 Difference in Differences

The difference in differences estimator can be thought of as a (naive) imputation estimator that constructs missing entries in \mathbf{Y}^0 as the average of nonmissing units + an intercept \mu, which is the average difference between treatment unit outcomes and control unit outcomes in the pre-treatment period. Note that this means all units get equal weight of 1/N_{co}, where N_{co} is the number of control units, and the weights aren’t data-driven.

Code
# did
did = DID(Y, mask) %>% .[nrow(.),]

Although the weights aren’t data driven, the offset \mu is learned on the basis of pre-treatment outcomes for treated and control units, so DiD may be considered a Vertical regression.

1.1.2 Synthetic Control

The Synthetic Control estimator imputes missing values in \mathbf{Y}^0 as a weighted average of post-treatment outcomes for control units, where the weights are learned using simplex regression of pre-treatment outcomes for the treatment unit on the pre-treatment outcomes of control units.

Code
# synth
syn = adh_mp_rows(Y, mask) %>% .[nrow(.),]

Synth is a Vertical regression.

1.1.3 Elastic Net Synthetic Control

The DIFP (Doudchenko-Imbens-Ferman-Pinto) estimator is similar to the synthetic control estimator but allows for negative weights and an intercept, and to this end uses ridge regression of pre-treatment outcomes for the treatment unit on the pre-treatment outcomes of control units to learn weights.

Code
# enH
enH = en_mp_rows(Y, mask, num_alpha = 40) %>% .[nrow(.),]

EN Synth is a Vertical regression.

1.1.4 Elastic Net Time Series

The time series approach involves regressing the post-treatment outcomes for never-treated units on pre-treatment outcomes, and using these coefficients to impute the untreated potential outcome for the treatment unit.

Code
# enV
enV = en_mp_rows(t(Y), t(mask), num_alpha = 40) %>% .[, ncol(.)]

TS Synth is a Horizontal regression.

1.1.5 Matrix Completion with Nuclear Norm Minimization (M)

The core idea of matrix completion in this setting is to model \mathbf{Y}^0 as

\mathbf{Y}^{0} = \mathbf{L}_{N \times T} + \boldsymbol{\varepsilon} = \mathbf{U}_{N \times R} \mathbf{V}^{\top}_{R \times T} + \boldsymbol{\varepsilon}_{N \times T}

Here, the number of factors / rank R isn’t known and must be estimated. In applied econometric practice, researchers frequently use the additive fixed effects specification Y_{it}^0 = \gamma_i + \delta_t + \varepsilon_{it}, which is an instance of matrix completion with R = 2 where \mathbf{U} = [\boldsymbol{\alpha} \; ; 1] and \mathbf{V}^{\top}= [\mathbf{1} \; ; \boldsymbol{\delta}]. This factor structure can be considered in terms of the Singular Value Decomposition (SVD). Using the SVD of the observed data matrix \mathbf{Y}, we can write

\mathbf{Y}^0 = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^{\top}= \sum_{k=1}^K \sigma_k \mathbf{u}_k \mathbf{v}_k^{\top}

where K = \min(M, N) and \mathbf{\Sigma} is a diagnonal K \times K matrix with sorted entries \sigma_1 \geq \sigma_2 \dots \sigma_k \geq 0. By the Eckart-Young Theorem, a low rank appoximation is found by truncating this expansion at rank R. This problem is, however, nonconvex. We can use an alternative matrix norm, the `nuclear’ norm, which is the sum of the singular values of \mathbf{L}. Letting \mathcal{O} denote nonmissing (i,t) pairs, we can write the optimization problem as

\text{min}_{\mathbf{L}} \sum_{(i, t) \in \mathcal{O}} \left( Y_{it} - L_{it} \right )^2 + \lambda_L \lVert \mathbf{L} \rVert_{*}

where \lVert \cdot \rVert_{*} is the nuclear norm.

Code
# mco
mco = mcnnm_cv(Y, mask) %>% predict_mc() %>% .[nrow(.),]

The same approach by hand without fixed effects.

Code
# filling package
# fn for matrix completion library
fillY0 = \(f, ...) f(Y0, ...)$X %>% .[nrow(.),]
# mask treatment outcomes for filling package
Y0 = replace(Y, !mask, NA)

# nuclear norm - takes long
nucY0 = fillY0(fill.nuclear)

Matrix Completion is a mixed approach since it can leverage both vertical and horizontal information embedded in the low-rank approximation.

1.1.6 Singular Value Thresholding

Same as above, but nonconvex problem because the matrix rank is in the optimisation problem. Might scale poorly for large datasets, but inherits all qualities.

Code
# singular value thresholding
svtY0 = fillY0(fill.SVT, lambda = 5)

1.1.7 Nearest Neighbours Matching

Here, we find donors \mathcal{M} that match the pre-treatment outcomes for the treatment unit closely, and take the inverse-distance-weighted average to impute Y^0 in the post-period.

Code
# KNN
knnY0 = fillY0(fill.KNNimpute, k = 5)

This is a discretized version of Synth, so it is also Vertical.

1.2 Comparison

Code
predY = data.table(Yr, Ycal,
  `Nuclear Norm` = nucY0,
  `KNN` = knnY0,
  # `Simple Mean` = simY0,
  `Singular Value Thresholding` = svtY0,
  `Difference in Differences` = did,
  `Elastic Net H` = enH,
  `Elastic Net V` = enV,
  `Synth` = syn,
  `MCNNM` = mco) %>%
  setDT

predYlong = melt(predY, id.var = "Yr")
fig = ggplot() +
  geom_line(data = df[State != "California"], aes(x = Year,
                                        y = PacksPerCapita, group = State),
    linewidth = 0.5, alpha = 0.1) +
  # california
  geom_line(data = predYlong[variable == "Ycal"], aes(Yr, value), colour = '#060404',
    linewidth = 2) +
  geom_vline(aes(xintercept = 1988.5), col = '#44203c', linetype = 'dotted') +
  # imputations
  geom_line(data = predYlong[variable != "Ycal"],
    aes(Yr, value, colour = variable), #height = 0, width = 0.5,
      linewidth = 1.5, alpha = 0.9) +
  scale_colour_brewer(palette = "Paired") +
  lal_plot_theme() +
  ylim(c(NA, 150)) + labs(colour = "")

fig

Pre-treatment fit for DiD is bad, so it is likely over-estimated. Most other approaches produce roughly similar results.

Interactive – occasionally fails on quarto blog for some reason - see here for big friendly viz.

Code
htmlwidgets::saveWidget(ggplotly(fig), "CAtrajectories.html")

ggplotly(fig)

1.3 References

Abadie, A., A. Diamond, and J. Hainmueller. (2010): “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program,” Journal of the American Statistical Association, 105, 493–505.

—. (2015): “Comparative Politics and the Synthetic Control Method,” American journal of political science, 59, 495–510.

Athey, S., M. Bayati, N. Doudchenko, G. Imbens, and K. Khosravi. (2017): “Matrix Completion Methods for Causal Panel Data Models,” arXiv [math.ST],.

Arkhangelsky, D., S. Athey, D. A. Hirshberg, G. W. Imbens, and S. Wager. (2021): “Synthetic Difference-in-Differences,” The American economic review,.

Bai, J. (2009): “Panel Data Models with Interactive Fixed Effects,” Econometrica: journal of the Econometric Society, 77, 1229–79.

Candès, E. J., & Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5), 2053-2080.

Doudchenko, N., and G. W. Imbens. (2016): “Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis.,” Working Paper Series, National Bureau of Economic Research.

Gobillon, L., and T. Magnac. (2016): “Regional Policy Evaluation: Interactive Fixed Effects and Synthetic Controls,” The review of economics and statistics, 98, 535–51.

Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research, 11, 2287-2322.

Davenport, M. A., & Romberg, J. (2016). An overview of low-rank matrix recovery from incomplete observations. IEEE Journal of Selected Topics in Signal Processing, 10(4), 608-622.

Xu, Y. (2017): “Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models,” Political analysis: an annual publication of the Methodology Section of the American Political Science Association, 25, 57–76.