crabbymetrics
  • Home
  • API
  • Binding Crash Course
  • Supervised Learning
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Synthetic Control
    • Logit
    • Multinomial Logit
    • Poisson
    • TwoSLS
    • GMM
    • FTRL
    • MEstimator Poisson
  • Semiparametrics
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
  • Unsupervised Learning
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Bridging Finite And Superpopulation
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV

On this page

  • 1 Three Fixed Finite Populations
  • 2 Geometry Of The Potential Outcomes
  • 3 Takeaway

First Course Ding: Chapter 4

Neyman repeated-sampling inference in randomized experiments

Chapter 4 shifts from Fisher’s sharp-null logic to Neyman’s repeated-sampling logic. The finite population is fixed and the treatment assignment is repeatedly redrawn.

Show code
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

np.set_printoptions(precision=4, suppress=True)

1 Three Fixed Finite Populations

Show code
rng = np.random.default_rng(4)
n = 100
n1 = 60
n0 = n - n1
truth = 1.0

base = rng.exponential(scale=1.0, size=n)
y0_constant = np.sort(base)[::-1]
y1_constant = y0_constant + truth

y0_negative = np.sort(base)
y1_negative = y1_constant.copy()

y0_independent = rng.permutation(y0_constant)
y1_independent = y1_constant.copy()

scenarios = {
    "Constant effect": (y0_constant, y1_constant),
    "Negative correlation": (y0_negative, y1_negative),
    "Independent": (y0_independent, y1_independent),
}


def one_assignment(y0, y1, rng):
    z = np.zeros(n)
    z[rng.choice(n, size=n1, replace=False)] = 1.0
    y = z * y1 + (1.0 - z) * y0
    tau_hat = y[z == 1.0].mean() - y[z == 0.0].mean()
    v_hat = y[z == 1.0].var(ddof=1) / n1 + y[z == 0.0].var(ddof=1) / n0
    covered = abs(tau_hat - truth) <= 1.96 * np.sqrt(v_hat)
    return tau_hat, v_hat, covered


rows = []
mc = 2000
draw_store = {}
for label, (y0, y1) in scenarios.items():
    draws = np.array([one_assignment(y0, y1, rng) for _ in range(mc)])
    draw_store[label] = draws
    rows.append(
        {
            "scenario": label,
            "empirical_var_of_tau_hat": draws[:, 0].var(ddof=1),
            "average_neyman_var": draws[:, 1].mean(),
            "coverage": draws[:, 2].mean(),
        }
    )

pd.DataFrame(rows).set_index("scenario")
empirical_var_of_tau_hat average_neyman_var coverage
scenario
Constant effect 0.051276 0.051232 0.9350
Negative correlation 0.012884 0.051822 1.0000
Independent 0.021289 0.051539 0.9965

2 Geometry Of The Potential Outcomes

The negative correlation case is where Neyman’s variance is most conservative, because treatment effects vary and \(Y(1)\) and \(Y(0)\) are moving against each other.

Show code
fig, axes = plt.subplots(3, 2, figsize=(10, 12))
grid = np.linspace(-0.8, 0.8, 200)

for row, (label, (y0, y1)) in enumerate(scenarios.items()):
    draws = draw_store[label]
    empirical_sd = np.sqrt(draws[:, 0].var(ddof=1))

    axes[row, 0].scatter(y0, y1, alpha=0.7)
    axes[row, 0].set_xlabel("$Y(0)$")
    axes[row, 0].set_ylabel("$Y(1)$")
    axes[row, 0].set_title(label)

    centered = draws[:, 0] - truth
    axes[row, 1].hist(centered, bins=40, density=True, alpha=0.75)
    normal = np.exp(-0.5 * (grid / empirical_sd) ** 2) / (np.sqrt(2.0 * np.pi) * empirical_sd)
    axes[row, 1].plot(grid, normal, color="black", linewidth=2.0)
    axes[row, 1].set_xlabel("$\\hat\\tau - \\tau$")
    axes[row, 1].set_title(f"{label}: repeated assignments")

fig.tight_layout()

3 Takeaway

Chapter 4 is about repeated randomization with fixed potential outcomes. The Neyman variance estimator is conservative because the unit-level treatment-effect covariance term is unobserved, and the amount of conservatism depends on the finite-population geometry.