crabbymetrics
  • Home
  • API
  • Binding Crash Course
  • Supervised Learning
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Synthetic Control
    • Logit
    • Multinomial Logit
    • Poisson
    • TwoSLS
    • GMM
    • FTRL
    • MEstimator Poisson
  • Semiparametrics
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
  • Unsupervised Learning
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Bridging Finite And Superpopulation
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV

On this page

  • 1 Compliance-Class Simulation
  • 2 Wald And TwoSLS With Covariates

First Course Ding: Chapter 21

An experimental perspective on the instrumental variable

Chapter 21 treats IV as an experimental design problem: assignment \(Z\) shifts treatment uptake \(D\), and the Wald ratio turns those two reduced-form contrasts into a complier-effect estimate.

Show code
import matplotlib.pyplot as plt
import numpy as np

import crabbymetrics as cm

np.set_printoptions(precision=4, suppress=True)


def simulate_experiment(rng, shares, n=300):
    complier, always, never = shares
    n_c = int(round(n * complier))
    n_a = int(round(n * always))
    n_n = n - n_c - n_a

    d0 = np.r_[np.zeros(n_c), np.ones(n_a), np.zeros(n_n)]
    d1 = np.r_[np.ones(n_c), np.ones(n_a), np.zeros(n_n)]

    y0 = np.r_[
        rng.normal(1.0, 1.0, n_c),
        rng.normal(0.0, 1.0, n_a),
        rng.normal(2.0, 1.0, n_n),
    ]
    y1 = y0.copy()
    y1[:n_c] = rng.normal(3.0, 1.0, n_c)

    z = rng.binomial(1, 0.5, n).astype(float)
    d = z * d1 + (1.0 - z) * d0
    y = d * y1 + (1.0 - d) * y0
    return z, d, y


def wald(z, d, y):
    rf_d = d[z == 1.0].mean() - d[z == 0.0].mean()
    rf_y = y[z == 1.0].mean() - y[z == 0.0].mean()
    return rf_y / rf_d


def wald_delta_se(z, d, y):
    beta = wald(z, d, y)
    adj = y - beta * d
    v_adj = adj[z == 1.0].var(ddof=1) / np.sum(z == 1.0) + adj[z == 0.0].var(ddof=1) / np.sum(z == 0.0)
    rf_d = d[z == 1.0].mean() - d[z == 0.0].mean()
    return np.sqrt(v_adj) / abs(rf_d)


def wald_bootstrap_se(z, d, y, rng, n_boot=40):
    n = len(z)
    draws = np.zeros(n_boot)
    for b in range(n_boot):
        idx = rng.choice(n, size=n, replace=True)
        draws[b] = wald(z[idx], d[idx], y[idx])
    return draws.std(ddof=1)

1 Compliance-Class Simulation

Show code
rng = np.random.default_rng(21)
scenarios = {
    "Strong IV": [0.5, 0.25, 0.25],
    "Weak IV": [0.2, 0.4, 0.4],
    "Very weak IV": [0.1, 0.4, 0.5],
}

for label, shares in scenarios.items():
    estimates = []
    delta_ses = []
    boot_ses = []
    for _ in range(60):
        z, d, y = simulate_experiment(rng, shares)
        estimates.append(wald(z, d, y))
        delta_ses.append(wald_delta_se(z, d, y))
        boot_ses.append(wald_bootstrap_se(z, d, y, rng))
    print(
        f"{label:12s} mean Wald = {float(np.mean(estimates)): .4f}, "
        f"empirical sd = {float(np.std(estimates, ddof=1)): .4f}, "
        f"mean delta se = {float(np.mean(delta_ses)): .4f}, "
        f"mean bootstrap se = {float(np.mean(boot_ses)): .4f}"
    )
Strong IV    mean Wald =  1.9865, empirical sd =  0.4097, mean delta se =  0.4183, mean bootstrap se =  0.4393
Weak IV      mean Wald =  1.9794, empirical sd =  1.1323, mean delta se =  1.3096, mean bootstrap se =  4.3935
Very weak IV mean Wald =  1.4279, empirical sd =  11.7076, mean delta se =  18.0295, mean bootstrap se =  nan

2 Wald And TwoSLS With Covariates

Show code
rng = np.random.default_rng(2101)
n = 1000
x = rng.normal(size=(n, 2))
z = rng.binomial(1, 0.5, size=n).astype(float)
d = (0.2 + 0.7 * z + 0.5 * x[:, 0] - 0.3 * x[:, 1] + rng.normal(scale=0.8, size=n) > 0.0).astype(float)
y = 1.5 * d + 0.4 * x[:, 0] - 0.2 * x[:, 1] + rng.normal(scale=1.0, size=n)

wald_raw = wald(z, d, y)

iv = cm.TwoSLS()
iv.fit(d[:, None], x, z[:, None], y)
iv_hc1 = iv.summary(vcov="hc1")

print("Wald ratio:", round(wald_raw, 4))
print("Wald delta-method SE:", round(float(wald_delta_se(z, d, y)), 4))
print("TwoSLS coefficient on endogenous treatment:", round(float(iv_hc1["coef"][0]), 4))
print("TwoSLS HC1 SE:", round(float(iv_hc1["coef_se"][0]), 4))
Wald ratio: 1.5798
Wald delta-method SE: 0.3202
TwoSLS coefficient on endogenous treatment: 1.6785
TwoSLS HC1 SE: 0.2846

The IV chapter is easiest to read from the reduced forms outward:

  • assignment shifts treatment uptake
  • assignment shifts outcomes
  • the ratio of those two shifts is the causal effect for compliers

TwoSLS is the multivariate generalization of that same logic.