This page adds two cached Monte Carlo comparisons for the new semiparametric estimators.
Continuous treatment: a partially linear design where both the treatment regression and the outcome nuisance are strongly nonlinear. EPLM only sees the raw controls, while PartiallyLinearDML gets a richer nonlinear basis and cross-fit ridge nuisances.
Binary treatment: a clean logit assignment model in the raw covariates, but a nonlinear response surface. We compare a plain linear regression on the raw controls, entropy balancing on a richer balance basis, and cross-fit AIPW on that same basis.
As with the other ablation page, Quarto execution caching and document freezing are enabled so later site renders reuse the saved outputs until this file changes.
1 Imports And Helpers
Show code
from html import escapeimport matplotlib.pyplot as pltimport numpy as npfrom IPython.display import HTML, displayimport crabbymetrics as cmnp.set_printoptions(precision=4, suppress=True)def html_table(headers, rows): parts = ["<table>","<thead>","<tr>",*[f"<th>{escape(str(header))}</th>"for header in headers],"</tr>","</thead>","<tbody>", ]for row in rows: parts.append("<tr>")for cell in row: parts.append(f"<td>{cell}</td>") parts.append("</tr>") parts.extend(["</tbody>", "</table>"])return"".join(parts)def expit(x):return1.0/ (1.0+ np.exp(-x))def summarize_interval(estimates, ses, truth): estimates = np.asarray(estimates) ses = np.asarray(ses) covered = np.abs(estimates - truth) <=1.96* sesreturn {"mean_estimate": float(estimates.mean()),"bias": float(estimates.mean() - truth),"abs_bias": float(abs(estimates.mean() - truth)),"empirical_sd": float(estimates.std(ddof=1)),"mean_se": float(ses.mean()),"rmse": float(np.sqrt(np.mean((estimates - truth) **2))),"coverage": float(covered.mean()), }def summarize_point(estimates, truth): estimates = np.asarray(estimates)return {"mean_estimate": float(estimates.mean()),"bias": float(estimates.mean() - truth),"abs_bias": float(abs(estimates.mean() - truth)),"empirical_sd": float(estimates.std(ddof=1)),"rmse": float(np.sqrt(np.mean((estimates - truth) **2))), }def plot_metric(ax, results, metric, title, ylabel): sample_sizes =sorted({entry["n"] for entry in results}) methods = []for entry in results:if entry["method"] notin methods: methods.append(entry["method"])for method in methods: values = [next(entry[metric] for entry in results if entry["n"] == n and entry["method"] == method)for n in sample_sizes ] ax.plot(sample_sizes, values, marker="o", linewidth=2.0, label=method) ax.set_title(title) ax.set_xlabel("Sample size") ax.set_ylabel(ylabel) ax.legend()def continuous_basis(x): x1, x2, x3, x4, x5 = x.Treturn np.column_stack( [ x, x**2-1.0, x1 * x2, x1 * x3, x2 * x4, x3 * x5, np.sin(x1), np.cos(x2), np.sin(x1 * x2), np.exp(0.3* x3), np.exp(0.3* x4), np.tanh(x5), ] )def balance_basis(x): x1, x2, x3, x4, x5 = x.Treturn np.column_stack( [ x, np.exp(0.35* x1), x2 * x3, x4**2-1.0, np.sin(x5), ] )
2 Monte Carlo Design
The continuous-treatment section reports bias, RMSE, and 95 percent coverage because both estimators return influence-function standard errors. BalancingWeights is currently a weighting estimator rather than a full inference object, so the binary-treatment section focuses on point-estimate bias and RMSE.
EPLM is intentionally given only the raw five covariates, so its linear nuisance model is misspecified on both the treatment and outcome side. PartiallyLinearDML instead gets the richer basis \(B(X)\) defined above and uses three-fold cross-fitting with a log-spaced ridge grid. Within each fold, the outcome and treatment nuisances choose their own penalties from that common candidate set.
3.1 Representative Dataset
Show code
def continuous_treatment_dgp(n, rng, tau=1.25): x = rng.normal(size=(n, 5)) x1, x2, x3, x4, x5 = x.T m = (0.45* x1-0.35* x2+0.25* x3+0.7* np.sin(x1 * x2)+0.45* np.exp(0.35* x4)-0.45* (x5**2-1.0) ) d = m + rng.normal(scale=0.7, size=n) g = (0.6* np.sin(x1)+0.5* x2 * x3+0.35* np.exp(0.25* x4)-0.4* (x5**2-1.0)+0.3* np.cos(x1 * x2) ) y =1.0+ tau * d + g + rng.normal(scale=0.9, size=n)return y, d, x, taurng = np.random.default_rng(20260331)y, d, x, tau_true = continuous_treatment_dgp(1200, rng)continuous_penalty_grid = np.logspace(-4, 2, 15)eplm = cm.EPLM()eplm.fit(y, d, x)eplm_summary = eplm.summary(vcov="hc1")dml = cm.PartiallyLinearDML(penalty=continuous_penalty_grid, cv=3, n_folds=3, seed=11)dml.fit(y, d, continuous_basis(x))dml_summary = dml.summary(vcov="hc1")display( HTML( html_table( ["Estimator", "Estimate", "SE"], [ ["Truth", f"{tau_true:.4f}", "--"], ["EPLM on raw X", f"{eplm_summary['coef']:.4f}", f"{eplm_summary['se']:.4f}"], ["PartiallyLinearDML on B(X)", f"{dml_summary['coef']:.4f}", f"{dml_summary['se']:.4f}"], ], ) ))
The pattern is sharp. Once the nuisance functions move outside the span of the raw controls, EPLM inherits the misspecification and never recovers. Cross-fit DML, fed a richer design basis, is much closer to the truth and its HC1 intervals behave sensibly.
4 Binary Treatment: Regression Versus Balancing Weights Versus AIPW
Vanilla regression fits OLS on [D, X] and therefore relies on a linear outcome regression in the raw covariates.
Entropy balancing reweights the treated and control groups separately to the pooled mean of a richer balance basis \(B(X)\).
AIPW uses that same \(B(X)\) inside its cross-fit ridge nuisances, with a log-spaced penalty grid, n_folds=3, and propensity_clip=0.1 to stabilize the propensity fit. The treated-outcome, control-outcome, and propensity regressions each choose their own penalties within each fold.
display( HTML( html_table( ["Balance diagnostic", "Value"], [ ["Treated weights max abs diff from pooled target", f"{balancing_diag['treated_max_abs_diff']:.3e}"], ["Control weights max abs diff from pooled target", f"{balancing_diag['control_max_abs_diff']:.3e}"], ], ) ))
This second design is deliberately less one-sided than the continuous-treatment case.
Vanilla regression is hurt by the nonlinear outcome surface and is uniformly less efficient than the calibrated weighting estimator.
Entropy balancing does especially well here because the assignment model is clean and the richer balance basis directly targets the part of the covariate distribution that matters for the response surface.
AIPW improves materially once the ridge propensity is clipped away from the extremes, but in this particular finite-sample design it is still somewhat noisier than direct balancing.
That is a useful result rather than an inconvenience: when treatment assignment is low-dimensional and balance can be imposed directly, a dedicated balancing estimator can be hard to beat.