In [1]:
options(digits=2)
# R Imports Boilerplate
rm(list=ls())
In [2]:
library(LalRUtils)
load_or_install(c('tidyverse','data.table', 'stargazer','lfe','boot', 'sandwich','lmtest',
  'magrittr', 'rio', 'cobalt', 'Matching', 'doParallel', 'foreach'))
theme_set(theme_bw())

exp_type = 'text'
  1. TRUE
  2. TRUE
  3. TRUE
  4. TRUE
  5. TRUE
  6. TRUE
  7. TRUE
  8. TRUE
  9. TRUE
  10. TRUE
  11. TRUE
  12. TRUE
  13. TRUE
In [3]:
root = '/home/alal/Dropbox/0_GradSchool/0_classes/450b/applied-econometrics-notes/'
data = paste0(root, 'data/')
In [4]:
nsw_exper = import(paste0(data, 'nsw_exper.dta'))
nsw_psid  = import(paste0(data,'nsw_psid_withtreated.dta'))

Regression Fundamentals

Main Results

Conditional Expectation Function (CEF)

$$ \mathbb{E}[Y|X=x] = \int y f(y|x) dy $$

Law of Iterated Expectation $$\mathbb{E}[Y_i] = \mathbb{E}[\mathbb{E}[Y_i|X_i]] $$

CEF Decomposition

$$ Y_i = \mathbb{E}(Y_i|X_i) + \epsilon_i $$

CEF Prediction Property

$$ \mathbb{E}[Y_i|X_i] = {arg min}_{m(x_i)} \mathbb{E}[(Y_i - m(X_i))^2] $$

ANOVA Theorem

$$ \mathbb{V}(Y_i) = \mathbb{V}(\mathbb{E}(Y_i|X_i)) + \mathbb{E}(\mathbb{V}(Y_I|X_I)) $$

Variance of Y = Variance of CEF (Explained Variance) + Variance of residual

Regression Anatomy / FWL

$$ \beta_k = \frac{Cov(Y_i, \tilde{x_{ki}})}{V(\tilde{x_{ki}})} $$

where $\tilde{x_{ki}}$ is the residual from a regression of $x_{ki}$ on all other covariates.

Justifications for Linear Regression:

  • Regression reveals conditional expectatiosn if CEF is linear
  • Regression reveals best linear predictor of $Y_i|X_i$
  • Regression reveals best linear predictor of $E[Y_i|X_i]$

Conditional Independence Assumption (CIA)

$$ Y_{si} \coprod s_i | X_i , \forall s $$

Conditional on $X_i$, $s_i$ can be said to be 'as good as randomly assigned'. Not Testable

Omitted Variable Bias Formula

$$ \frac{Cov(Y_i,O_i)}{V(O_i)} = \rho + \gamma' \delta_{UO} $$

Where the LHS is a vector of coefficients from the 'short' regression (regressing Y on $O_i$), $O_i$ is a vector of observed predictors (included in the regression), and $\rho$ is the corresponding vector of coefficients from regressing Y on $O_i$ and $U_i$ (long regression estimates). $U_i$ is a vector of unobserved predictors, and $\gamma$ is the corresponding set of coefficients from the long regression, and $\delta_{UO}$ is the vector of coefficients from regressing $U_i$ on $O_i$. In words, the OVB formula says:

Coefficient in Short Regression = Coefficient in long regression + effect of omitted $\times$ regression of omitted on included

Experimental Benchmark

In [5]:
ate_diffmeans(nsw_exper, re78, nsw)
$te
1794.34308292048
$std_err
670.996728049429
$sumstats
nswmeanssigma_2nobs
0 4555 3.0e+07260
1 6349 6.2e+07185
In [6]:
basespec = as.formula(re78 ~ nsw)
fullspec = as.formula(re78 ~ nsw + age + educ + black + hisp + married + re74 + re75 +
  u74 + u75)

m1 = robustify(felm(basespec, nsw_exper))
m2 = robustify(felm(fullspec, nsw_exper))
In [7]:
stargazer(m1, m2,
  column.labels = c('Basic', 'Controls'),
  dep.var.labels.include = FALSE,
  header = F,
  single.row = T,
  type = exp_type
)
==================================================================
                                 Dependent variable:              
                    ----------------------------------------------
                            Basic                 Controls        
                             (1)                     (2)          
------------------------------------------------------------------
nsw                 1,794.000*** (671.000)  1,672.000** (662.000) 
age                                            54.000 (40.000)    
educ                                         403.000** (161.000)  
black                                      -2,039.000* (1,039.000)
hisp                                         425.000 (1,427.000)  
married                                      -147.000 (864.000)   
re74                                            0.120 (0.130)     
re75                                            0.019 (0.140)     
u74                                         1,381.000 (1,555.000) 
u75                                        -1,072.000 (1,408.000) 
Constant            4,555.000*** (340.000)   221.000 (2,824.000)  
------------------------------------------------------------------
Observations                 445                     445          
R2                          0.018                   0.058         
Adjusted R2                 0.016                   0.037         
Residual Std. Error  6,580.000 (df = 443)   6,509.000 (df = 434)  
==================================================================
Note:                                  *p<0.1; **p<0.05; ***p<0.01

Treatement Effects under Unconfoundedness - Identification Result

Assumptions

  • $(Y_0 , Y_1) \coprod D | X$ - unconfoundedness / Selection on Observables / Ignorability

  • $0 < Pr(D=1|X) < 1$ - common support

Weakening assumptions to uncover ATT only

  • $(Y_0) \coprod D | X$ - unconfoundedness for controls

  • $Pr(D=1|X) < 1$ - weak overlap

Given unconfoundedness, we can write : $$ \begin{aligned} \mathbf { E } \left[ Y _ { 1 } - Y _ { 0 } | X \right] & = \mathbf { E } \left[ Y _ { 1 } - Y _ { 0 } | X , D = 1 \right] \\ & = \mathbf { E } [ Y | X , D = 1 ] - \mathbf { E } [ Y | X , D = 0 ] \end{aligned} $$

Estimands:

$$ \begin{aligned} \tau _ { A T E } & = \mathbf { E } \left[ Y _ { 1 } - Y _ { 0 } \right] = \int \mathbf { E } \left[ Y _ { 1 } - Y _ { 0 } | X \right] d P ( X ) \\ & = \int ( \mathbb { E } [ Y | D = 1 , X ] - \mathbb { E } [ Y | D = 0 , X ] ) d P ( X ) \end{aligned} $$$$ \begin{aligned} \tau _ { A T T } & = \mathbf { E } \left[ Y _ { 1 } - Y _ { 0 } | D = 1 \right] \\ & = \int ( \mathbf { E } [ Y | X , D = 1 ] - \mathbf { E } [ Y | X , D = 0 ] ) d P ( X | D = 1 ) \end{aligned} $$

Discrete Case ($x$ has finite values indexed by $1, \dots, m$

$$ \tau_{ATE} = \sum_{k = 1}^m ( \mathbb { E } [ Y | D = 1 , X = x_k ] - \mathbb { E } [ Y | D = 0 , X = k_k ] ) Pr(X = x_k) $$$$ \tau_{ATE} = \sum_{k = 1}^m ( \mathbb { E } [ Y | D = 1 , X = x_k ] - \mathbb { E } [ Y | D = 0 , X = k_k ] ) Pr(X = x_k | D = 1) $$

Estimation

Subclassification

$$ \hat { \tau } _ { A T E } = \sum _ { k = 1 } ^ { K } \left( \overline { Y } _ { 1 } ^ { k } - \overline { Y } _ { 0 } ^ { k } \right) \cdot \left( \frac { N ^ { k } } { N } \right) $$$$ \hat { \tau } _ { A T T } = \sum _ { k = 1 } ^ { K } \left( \overline { Y } _ { 1 } ^ { k } - \overline { Y } _ { 0 } ^ { k } \right) \cdot \left( \frac { N _ { 1 } ^ { k } } { N _ { 1 } } \right) $$
In [8]:
att_subclass = function(df, treat_var, outcome_var, ...){
  treat_var   = enquo(treat_var)
  outcome_var = enquo(outcome_var)
  subclass    = quos(...) # arbitrarily large sets of sublasses
  df %>%
    # count number of treated
    mutate(N_treat = sum(!! treat_var)) %>%
    # select and rename variables
    dplyr::select(N_treat, treatment = !! treat_var,
      outcome = !! outcome_var, !!! subclass) %>%
    # group by subclassification groups
    group_by(!!! subclass) %>%
    # count treatments, build weights by subclass
    mutate(n_treat = sum(treatment), wgt = n_treat/N_treat) %>%
    # group by subclass + treatment
    group_by(!!! subclass, treatment) %>%
    # subclass X treatment means
    mutate(ybar = mean(outcome)) %>%
    # take first row from each subclass X treatment combination
    slice(1) %>%
    # subset to relevant columns
    dplyr::select(n_treat, N_treat, ybar, wgt, treatment, !!! subclass) %>%
    # reshape (long to wide) so each subclass is a row
    spread(treatment, ybar, sep = '_') %>%
    # difference in means for each subclass + weighted mean
    mutate(ydiff = treatment_1 - treatment_0, wtydiff = wgt * ydiff) -> groupsums
  # overall ATT , drop missing subclasses
  att = sum(groupsums$wtydiff, na.rm = T)
  return(list(groupsums, att))
}
In [9]:
att_subclass(nsw_psid, nsw, re78, married)[[2]]
-11124.4016044324
In [10]:
nsw_psid %<>% mutate(terc_75 = ntile(re75, 3))
att_subclass(nsw_psid, nsw, re78, married, hisp, black, u75, terc_75)[[2]]
757.042627271346

Matching

Metrics

  • Euclidian Distance $$ E D \left( X _ { i } , X _ { j } \right) = \sqrt { \left( X _ { i } - X _ { j } \right) ^ { \prime } \left( X _ { i } - X _ { j } \right) } $$
  • Stata diagonal distance $$ \operatorname { StataD } \left( X _ { i } , X _ { j } \right) = \sqrt { \left( X _ { i } - X _ { j } \right) ^ { \prime } \operatorname { diag } \left( \Sigma _ { X } \right) ^ { - 1 } \left( X _ { i } - X _ { j } \right) } $$
  • Mahalanobis distance (scale-invariant) $$ M D \left( X _ { i } , X _ { j } \right) = \sqrt { \left( X _ { i } - X _ { j } \right) ^ { \prime } \Sigma ^ { - 1 } \left( X _ { i } - X _ { j } \right) } $$

Where $\Sigma$ is the variance-covariance matrix.

Estimators

$$ \hat { \tau } _ { A T T } = \frac { 1 } { N _ { 1 } } \sum _ { D = 1 } \left( Y _ { i } - Y _ { j ( i ) } \right) $$
$$ \hat { \tau } _ { A T T } = \frac { 1 } { N _ { 1 } } \sum _ { D = 1 } \left\{ Y _ { i } - \left( \sum _ { m = 1 } ^ { M } \frac { 1 } { M } Y _ { j m ( i ) } \right) \right\} $$

Bias-corrected (Abadie-Imbens) $$ \hat { \tau } _ { A T T } = \frac { 1 } { N _ { 1 } } \sum _ { D = 1 } \left( Y _ { i } - Y _ { j ( i ) } \right) - \left( \hat { \mu } _ { 0 } \left( X _ { i } \right) - \hat { \mu } _ { 0 } \left( X _ { j ( i ) } \right) \right) $$

Where $\mu _ { 0 } ( x ) = E [ Y | X = x , D = 0 ]$ is the population regression function under the control.

Bias-corrected matching

In [11]:
m1 = Match(Y = nsw_psid$re78, Tr = nsw_psid$nsw,
  X = nsw_psid %>% dplyr::select(age, educ, black, hisp,
        married, re74, re75, u74, u75),
  BiasAdjust = T,  estimand = 'ATT'
)

summary(m1)
Estimate...  2415.5 
AI SE......  1684.4 
T-stat.....  1.434 
p.val......  0.15156 

Original number of observations..............  2675 
Original number of treated obs...............  185 
Matched number of observations...............  185 
Matched number of observations  (unweighted).  218 

Propensity Score

Under unconfoundedness,

$$ \mathbf { E } [ Y | D = 1 , \pi ( X ) = \overline { \pi } ] - \mathbf { E } [ Y | D = 0 , \pi ( X ) = \overline { \pi } ] = \mathbf { E } \left[ Y _ { 1 } - Y _ { 0 } | \pi ( X ) = \overline { \pi } \right] $$

Balancing property of propensity score $$ \operatorname { Pr } ( X | D = 1 , \pi ( X ) ) = \operatorname { Pr } ( X | D = 0 , \pi ( X ) ) $$

Weighting on the Propensity Score: Estimands

$$ \tau _ { \text { ate } } = \mathbb { E } \left[ Y \cdot \frac { D - \pi ( X ) } { \pi ( x ) ( 1 - \pi ( X ) ) } \right] $$
$$ \tau _ { a t t } = \frac { 1 } { \operatorname { Pr } ( D = 1 ) } \mathbb { E } \left[ Y \cdot \frac { D - \pi ( X ) } { ( 1 - \pi ( X ) } \right] $$
In [12]:
pscore_fml = as.formula(nsw ~ age + educ + black + hisp +
  married + re74 + re75 + u74 + u75 )

pscore_fitter <- function(df=nsw_psid, fmla = pscore_fml){
  pscore1 = glm(fmla, df, family=binomial(link='logit'))
  data.frame(pscore = pscore1$fitted.values,
                      treat = as.factor(df$nsw))
}
In [13]:
pscores2 = pscore_fitter(nsw_exper)
In [14]:
p0 = ggplot(pscores2, aes(x = pscore, colour = treat)) +
  geom_density() +
  scale_colour_viridis_d() +
  labs(title = 'Density of Propensity Score by Treatment Status')

p1 = ggplot(pscores2, aes(x = pscore, fill = treat, colour = treat)) +
  geom_density(position='fill') +
  scale_colour_viridis_d() +
  scale_fill_viridis_d() +
  labs(title = 'Distribution of P(Treatment Status | Pscore = p)',
    caption = 'Experimental Sample'
  )

multiplot(p0, p1)

Matching on the propensity Score

In [15]:
pscore = pscore_fitter()$pscore
m4 = Match(Y = nsw_psid$re78, Tr = nsw_psid$nsw,
  X = pscore,
  BiasAdjust = T,  estimand = 'ATT'
)
Warning message:
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
In [16]:
summary(m4)
Estimate...  2145.8 
AI SE......  1554 
T-stat.....  1.3808 
p.val......  0.16733 

Original number of observations..............  2675 
Original number of treated obs...............  185 
Matched number of observations...............  185 
Matched number of observations  (unweighted).  1997 

Inverse Probability Weighting

$$ \hat{\tau}_{ATE} = \frac{1}{n} \sum_{i=1}^N \frac{D_i Y_i}{\hat{\pi}(X_i)} - \frac{(1 - D_i) Y_i}{1 - \hat{\pi}(X_i)} $$

where $\hat{\pi}(X_i)$ is the propensity score.

Regression estimation of ATE

Additional Assumptions

1) Constant treatment effects

2) Outcomes linear in X

  • (2) fails - $\tau_{OLS}$ is Best Linear Approximation.

  • (1) fails - $\tau_{OLS}$ is conditional variance weighted average of underlying $\tau$s.

$$ \frac { \mathbb { E } \left( \sigma _ { D | X } ^ { 2 } \tau _ { X } \right) } { \mathbb { E } \left( \sigma _ { D | X } ^ { 2 } \right) } $$
$$ \tau_{OLS} = \sum_{k=1}^m ( \mathbb{E}[Y | X = x_k, D = 1] - \mathbb{E}[Y | X = x_k, D = 0]) \; w_k $$

where

$$ w_k = \frac{Var(D|X=x_k) Pr(X=x_k)}{\sum_{r=1}^{m} Var(D|X=x_r) Pr(X = x_r) } $$

$Var(D|X=x_k)$ is maximal when $Pr(D=1|X=X_k) = 1/2$ and is decreasing as this probability moves towards 0 and 1. In other words, $\tau_{OLS}$ weighs up groups where the size of the treated and untreated population are roughly equal, and weighs down groups with large imbalances in the size of these two groups.

$\tau_{OLS}$ is true effect IFF constant treatment effects holds.

In [17]:
regform = as.formula(re78 ~ nsw + age + educ + black + hisp +
  married + re74 + re75 + u74 + u75 )

stargazer(lm(regform, nsw_psid), lm(regform, nsw_exper), type = exp_type)
======================================================================
                                   Dependent variable:                
                    --------------------------------------------------
                                           re78                       
                               (1)                       (2)          
----------------------------------------------------------------------
nsw                          115.000                1,672.000***      
                           (1,007.000)                (634.000)       
                                                                      
age                         -90.000***                 54.000         
                             (22.000)                 (45.000)        
                                                                      
educ                        514.000***                403.000**       
                             (76.000)                 (177.000)       
                                                                      
black                        -454.000                -2,039.000*      
                            (497.000)                (1,164.000)      
                                                                      
hisp                       2,197.000**                 425.000        
                           (1,092.000)               (1,561.000)      
                                                                      
married                    1,205.000**                -147.000        
                            (585.000)                 (881.000)       
                                                                      
re74                         0.310***                   0.120         
                             (0.032)                   (0.087)        
                                                                      
re75                         0.540***                   0.019         
                             (0.031)                   (0.150)        
                                                                      
u74                        2,390.000**                1,381.000       
                           (1,024.000)               (1,186.000)      
                                                                      
u75                         -1,462.000               -1,072.000       
                            (947.000)                (1,023.000)      
                                                                      
Constant                     954.000                   221.000        
                           (1,371.000)               (2,633.000)      
                                                                      
----------------------------------------------------------------------
Observations                  2,675                      445          
R2                            0.590                     0.058         
Adjusted R2                   0.590                     0.037         
Residual Std. Error   10,064.000 (df = 2664)    6,509.000 (df = 434)  
F Statistic         379.000*** (df = 10; 2664) 2.700*** (df = 10; 434)
======================================================================
Note:                                      *p<0.1; **p<0.05; ***p<0.01

Double Robust Estimation

Doubly Robust estimator:

$$ \widehat { \tau } _ { \mathrm { ATE } } = \frac { 1 } { n } \sum _ { i = 1 } ^ { n } \left( \frac { W _ { i } \left( Y _ { i } - \widehat { \mu } _ { 1 } \left( X _ { i } \right) \right) } { \widehat { \pi } \left( X _ { i } \right) } + \widehat { \mu } _ { 1 } \left( X _ { i } \right) \right) - \left( \frac { \left( 1 - W _ { i } \right) \left( Y _ { i } - \widehat { \mu } _ { 0 } \left( X _ { i } \right) \right) } { 1 - \widehat { \pi } \left( X _ { i } \right) } + \widehat { \mu } _ { 0 } \left( X _ { i } \right) \right) $$

Consistent estimation of $\pi(.)$ or $\mu_1(.)$ and $\mu_0(.)$ required for validity.

Treatment Effects Under Unconfoundedness - cf Imbens

Change of notation - $w$ is treatment, $\hat{e}(x)$ is the propensity score; standard errors are almost always bootstrapped

In [18]:
data = read_table(paste0(data, 'lalonde_exper.txt'), col_names = c('t', 're78', 'education', 're74', 're75'))
Parsed with column specification:
cols(
  t = col_double(),
  re78 = col_double(),
  education = col_double(),
  re74 = col_double(),
  re75 = col_double()
)
In [19]:
boot.extract <- function(obj){
	point_est = obj[[1]]
	se = sd(obj[[2]])
	return(c(point_est, se))
}

Difference in Means estimator

$$ \begin{aligned} \hat { \tau } _ { D M } & = \frac { 1 } { n _ { 1 } } \sum w _ { i } y _ { i } - \frac { 1 } { n _ { 0 } } \sum \left( 1 - w _ { i } \right) y _ { i } \\ & = \frac { \sum w _ { i } y _ { i } } { \sum w _ { i } } - \frac { \sum \left( 1 - w _ { i } \right) y _ { i } } { \sum \left( 1 - w _ { i } \right) } \end{aligned} $$
In [20]:
# Regression function for boot
lm.boot <- function(formula, data, indices){
	fit = lm(formula, data=data[indices,])
	return(coef(fit))
}

lm.r <- function(data, formula){
	tmp = lm(formula, data)
	out = coeftest(tmp, vcov = vcovHC(tmp, "HC1"))
	return(out)
}
In [21]:
# 1. difference in means
diff_fit = lm.r(data, re78 ~ t)
diff_boot_output = boot(data, lm.boot, 1000, formula=re78~t)
diff = c(diff_fit[2,1], diff_fit[2,2])
diff_boot = c(diff_boot_output[[1]][2], sd(diff_boot_output[[2]][,2]))
In [22]:
diff_boot
t
1.79434139293137
2
0.654544698603601

Covariate Adjustment Estimator

$$ \begin{aligned} \mu _ { 1 } ( X ) & = \mathbb { E } \left( Y _ { i } | X = x , W = 1 \right) \\ \mu _ { 0 } ( X ) & = \mathbb { E } \left( Y _ { i } | X = x , W = 0 \right) \\ \hat { \tau } _ { r e g } & = \mathbb { E } \left( \mu _ { 1 } ( X ) - \mu _ { 0 } ( X ) \right) = \frac { 1 } { N } \sum \left( \mu _ { 1 } ( X ) - \mu _ { 0 } ( X ) \right) \end{aligned} $$
In [23]:
# Covariate adjustment function for boot
cov.adj.boot <- function(data, treat, outcome, cov, indices) {
	d = data[indices,]
	formula = formula(paste0(outcome, "~", cov))
	mu0 = d[d[,treat]==0,] %>%
		lm(formula, .) %>%
		predict(d)
	mu1 = d[d[,treat]==1,] %>%
		lm(formula, .) %>%
		predict(d)
	return(mean(mu1-mu0))
}
In [24]:
# 2 Estimate the effect using covariate adjustment
boot(data, cov.adj.boot, 1000, 
	treat="t", outcome="re78", cov="re75") %>%
	boot.extract()
  1. 1.74905061266194
  2. 0.647774766649798

Inverse Propensity Weighting

$$ \hat { \tau } _ { i p w } = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \left( \frac { w _ { i } Y _ { i } } { \hat { e } \left( X _ { i } \right) } - \frac { w _ { i } \left( 1 - Y _ { i } \right) } { 1 - \hat { e } \left( X _ { i } \right) } \right) $$
In [25]:
# Inverse propensity score weighting function for boot
ipw.boot <- function(data, outcome, treat, cov, indices) {
	d = as.data.frame(data[indices,])
	formula = formula(paste0(treat, "~", cov))
	fit_wt = glm(formula, binomial(link="logit"), d)
	prob_wt = predict(fit_wt, type="response")
	w = d[, treat]
	y = d[, outcome]
	out = mean(w*y/prob_wt - (1-w)*y/(1-prob_wt))
	return(out)
}
In [26]:
# 3 Estimate the effect using inverse propensity score weighting
boot(data, ipw.boot, 1000, 
	outcome="re78", treat="t", cov="re75") %>%
	boot.extract()
  1. 1.74731645095702
  2. 0.666391602341532

Double Robust Estimator

$$ \widehat { \tau } _ { \mathrm { es } } = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \left( \frac { W _ { i } \cdot \left( Y _ { i } - \widehat { \mu } _ { 1 } \left( X _ { i } \right) \right) } { \hat { e } \left( X _ { i } \right) } - \frac { \left( 1 - W _ { i } \right) \cdot \left( Y _ { i } - \widehat { \mu } _ { 0 } \left( X _ { i } \right) \right) } { 1 - \hat { e } \left( X _ { i } \right) } \right) + \left\{ \widehat { \mu } _ { 1 } \left( X _ { i }\right) - \widehat { \mu } _ { 0 } \left( X _ { i } \right) \right\} $$
In [27]:
# Doubly-robust estimator function for boot
dr.boot <- function(data, outcome, treat, cov, indices) {
	d = as.data.frame(data[indices,])
	formula_y = formula(paste0(outcome, "~", cov))
	formula_w = formula(paste0(treat, "~", cov))
	mu0 = d[d[,treat]==0,] %>%
		lm(formula_y, .) %>%
		predict(d)
	mu1 = d[d[,treat]==1,] %>%
		lm(formula_y, .) %>%
		predict(d)
	fit_wt = glm(formula_w, binomial(link="logit"), d)
	prob_wt = predict(fit_wt, type="response")
	w = d[, treat]
	y = d[, outcome]
	out = mean(w*(y-mu1)/prob_wt - (1-w)*(y-mu0)/(1-prob_wt) + 
		mu1 - mu0)
	return(out)
}
In [28]:
# 4 Estimate the effect using a doubly-robust estimator
(dr = boot(data, dr.boot, 1000, 
	outcome="re78", treat="t", cov="re75") %>%
	boot.extract())
  1. 1.74815919288119
  2. 0.649936038461851

Semiparametric Efficiency Bound

Score function:

$$ \psi ( y , w , x , \tau ) = \frac { w y } { e ( x ) } - \frac { ( 1 - w ) y } { 1 - e ( x ) } - \tau - \left( \frac { \mu _ { 1 } ( x ) } { e ( x ) } + \frac { \mu _ { 0 } ( x ) } { 1 - e ( x ) } \right) ( w - e ( x ) ) $$

Efficiency Bound:

$$ \mathbb { V } \geq \mathbb { E } \left[ \psi ( Y , W , X , \tau ) ^ { 2 } \right] $$

Where RHS is $$ = \mathbb { E } \left( \frac { \sigma _ { 1 } ^ { 2 } ( x ) } { e \left( X _ { i } \right) } + \frac { \sigma _ { 0 } ^ { 2 } ( x ) } { 1 - e \left( X _ { i } \right) } + \left( \mu _ { 1 } \left( x _ { i } \right) + \mu _ { 0 } \left( x _ { i } \right) - \tau \right) ^ { 2 } \right) $$

In [29]:
# A function for estimating the efficiency bound
estimate.eff.bound <- function(data, outcome, treat, cov, tau){
	formula_y = formula(paste0(outcome, "~", cov))
	formula_w = formula(paste0(treat, "~", cov))
	fit0 = data[data[,treat]==0,] %>% lm(formula_y, .)
	fit1 = data[data[,treat]==1,] %>% lm(formula_y, .)
	mu0 = predict(fit0, data)
	mu1 = predict(fit1, data)
	var0 = (summary(fit0)$sigma)^2
	var1 = (summary(fit1)$sigma)^2
	prob_wt = predict(glm(formula_w, binomial(link="logit"), data), type="response")
	est_eff_bound = sqrt(mean(var1/prob_wt + var0/(1-prob_wt) + (mu1 - mu0 - tau)^2)/nrow(data))
	return(est_eff_bound)
}
In [30]:
estimate.eff.bound(data, "re78", "t", "re75", dr[1])
0.671287709115994

Sensitivity Analysis

  • Falsification tests :
    • estimate treatment effect on placebo groups for whom $\tau$ should be 0
      • if placebo effect is nonzero, unconfoundedness is less plausible

Placebo Tests

Assume $D$ is realised at $t = 0$ and is ignorable conditional on a set of $T$ lags of the outcome.

$$ Y _ { 1 } , Y _ { 0 } \perp D | Y _ { t = - 1 } , Y _ { t = - 2 } , \ldots , Y _ { t = - T } , X $$

This implies we should have ignorability conditional on all lags but one

$$ Y _ { t = - 1 } \perp D | Y _ { t = - 2 } , \ldots , Y _ { t = - T } , X $$

Estimate treatment effect on $Y_{t=-1}$, nonzero treatment effect would imply that unconfoundedness is implausible

Parametric Sensitivity Analysis

Imbens (2003)

$U$ is a nuisance parameter.

$$ Y_1, Y_0 \perp D|X, U $$

Where $U \sim \mathcal{B}(\pi = 0.5)$, and $U \perp X$. $P(U=1) = P(U=0) = 0.5$.

Propensity score is Logistic:

$$ P(D = 1|X, U) = \frac{exp(X\theta + \gamma U)}{1 + exp(X\theta + \gamma U)} $$

$\gamma$ indicates strength of relationship between $U$ and $D|X$.

Y is conditionally normal $$ Y | X, U \sim \mathcal{N}(\alpha D + X\beta + \delta U , \sigma^2) $$

$\delta$ indicates strength of relationship between $U$ and $Y|X$.

MLE setup

Construct grid of $(\gamma, \delta)$ and calculate the MLE for $\hat{\alpha}(\gamma, \delta)$ by maximising $\mathcal{l}(\alpha, \beta, \theta, \gamma, \delta)$ over $(\gamma, \delta)$.

Use 2 partial $R^2$s:

  • $R^2_{Y, par} (\delta)$: Residual variation in outcome explained by $U$ (after partialling out $X$).

  • $R^2_{D, par} (\gamma)$: Residual variation in treatment assignment explained by $U$ (after partialling out $X$).

Draw threshold contours, should expect most covariates to be clustered around origin.

Rosenbaum (2002)

Tuning parameter $\Gamma \geq 1$ that measures departure from zero hidden bias.

For any two observations $i$ and $j$ with identical covariate values $X_i = X_j$, under unconfoundedness, probability of assignment into treatment should be identical $\pi(X_i) = \pi(X_j)$

Treatment assignment probability may differ due to unobserved binary confounder $U$. We can bound this by the ratio:

$$ \frac{1}{\Gamma} \leq \frac{e_i (1-e_j)}{(1-e_i) e_j} \leq \Gamma $$

$\gamma = 1 \implies $ No bias. $\Gamma =2 \implies$ $i$ is twice as likely to be treated than $j$ despite identical $x$.

Altonji, Elder, Taber (2005) , Oster (2018)

How much does treatment effect move when controls are added?

Estimate model with and without controls:

  • $Y_i = \alpha^F D_i + X \beta + \epsilon$

  • $Y_i = \alpha^R D_i + \epsilon$

AET ratio:

$$ \rho = \frac{\hat{\alpha}^F}{\hat{\alpha}^R - \hat{\alpha}^F} $$

Want $\rho$ to be as big as possible (i.e. $\hat{\alpha}^R - \hat{\alpha}^F} \to 0$ under unconfoundedness).