contingency-table evidence can be summarized with simple callback-rate differences
Simpson’s paradox is a warning that pooled regressions can point in the wrong direction when group composition shifts
Show code
from pathlib import Pathimport matplotlib.pyplot as pltimport numpy as npimport pandas as pdimport crabbymetrics as cmnp.set_printoptions(precision=4, suppress=True)def repo_root():for candidate in [Path.cwd().resolve(), *Path.cwd().resolve().parents]:if (candidate /"ding_w_source").exists():return candidateraiseFileNotFoundError("could not locate ding_w_source from the current working directory")data_dir = repo_root() /"ding_w_source"
1 Lalonde Observational Data
The original notebook starts with the Lalonde-style CPS comparison. Here the point is simple: the treatment coefficient can move a long way once we control for observable differences.
Within each group below, the relationship between \(x\) and \(y\) is positive. Pooled together, it becomes negative because the high-\(x\) group also has a much lower intercept.
Chapter 1 is mostly about interpretation discipline. crabbymetrics.OLS is enough to reproduce the main lesson: raw differences, adjusted differences, and grouped summaries answer different questions even when they use the same underlying observations.