crabbymetrics
  • Home
  • API
  • Binding Crash Course
  • Supervised Learning
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Synthetic Control
    • Logit
    • Multinomial Logit
    • Poisson
    • TwoSLS
    • GMM
    • FTRL
    • MEstimator Poisson
  • Semiparametrics
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
  • Unsupervised Learning
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Bridging Finite And Superpopulation
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV

On this page

  • 1 Exact Randomization Distribution Under A Sharp Null
  • 2 The Assignment Mechanism Is The Whole Engine
  • 3 Takeaway

First Course Ding: Chapter 3

Completely randomized experiments and the Fisher randomization test

Chapter 3 is the cleanest design-based chapter in the book. With fixed potential outcomes and a known treatment-assignment mechanism, Fisherian inference is just a permutation problem.

Show code
import itertools
import math

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import crabbymetrics as cm

np.set_printoptions(precision=4, suppress=True)

1 Exact Randomization Distribution Under A Sharp Null

Show code
rng = np.random.default_rng(3)
n = 12
n1 = 6
y0 = np.array([3.0, 3.3, 2.8, 4.0, 3.5, 2.9, 4.2, 3.1, 2.7, 3.8, 3.4, 4.1])
tau = 1.2 + 0.3 * np.linspace(-1.0, 1.0, n)
y1 = y0 + tau

z_obs = np.array([1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0], dtype=float)
y_obs = z_obs * y1 + (1.0 - z_obs) * y0

def diff_in_means(y, z):
    treated = z == 1.0
    control = ~treated
    return y[treated].mean() - y[control].mean()

def normal_pvalue(z_stat):
    return math.erfc(abs(float(z_stat)) / math.sqrt(2.0))

tau_hat = diff_in_means(y_obs, z_obs)
model = cm.OLS()
model.fit(z_obs[:, None], y_obs)
asy = model.summary(vcov="hc1")
z_stat = asy["coef"][0] / asy["coef_se"][0]

assignments = np.array(list(itertools.combinations(range(n), n1)))
sharp_null_draws = np.zeros(assignments.shape[0])
for idx, treated_ids in enumerate(assignments):
    z = np.zeros(n)
    z[list(treated_ids)] = 1.0
    sharp_null_draws[idx] = diff_in_means(y_obs, z)

frt_pvalue = np.mean(np.abs(sharp_null_draws) >= abs(tau_hat))

pd.DataFrame(
    {
        "estimate": [tau_hat],
        "HC1 normal p-value": [normal_pvalue(z_stat)],
        "Fisher randomization p-value": [frt_pvalue],
    }
)
estimate HC1 normal p-value Fisher randomization p-value
0 0.369697 0.105021 0.149351
Show code
fig, ax = plt.subplots(figsize=(6, 4))
ax.hist(sharp_null_draws, bins=30, color="tab:blue", alpha=0.75)
ax.axvline(tau_hat, color="black", linestyle="--", linewidth=2.0)
ax.set_xlabel("Difference in means under sharp null")
ax.set_ylabel("Count")
ax.set_title("Exact Fisher randomization distribution")
fig.tight_layout()

2 The Assignment Mechanism Is The Whole Engine

The randomization distribution above does not need any sampling model for the outcomes. The only ingredients are:

  • the observed outcomes
  • the observed treatment assignment
  • the set of assignments allowed by the experiment

That is why Chapter 3 sits so naturally in plain numpy: the inferential object is the assignment rule, not a parametric likelihood.

3 Takeaway

For a completely randomized experiment, crabbymetrics.OLS can report the same difference in means that the experiment was built around, but the Fisher test itself is a permutation calculation. That design-first perspective is the point of the chapter.